©WILEY 


Applied Numerical 
Methods Using 
MATLAB* 


Wtm V. ) mtg 

vfcnwu Gw* TkeSmg mid John Morris 







APPLIED NUMERICAL 
METHODS USING 
MATLAB® 


Won Young Yang 

Chung-Ang University, Korea 

Wenwu Cao 

Pennsylvania State University 

Tae-Sang Chung 

Chung-Ang University, Korea 

John Morris 

The University of Auckland, New Zealand 


WILEY- 

INTERSCIENCE 


A JOHN WILEY & SONS, INC., PUBLICATION 




Questions about the contents of this book can be mailed to wyyang@cau.ac.kr. 

MATLAB® and Simulink® are trademarks of the The Math Works, Inc. and are used with 
permission. The Math Works does not warrant the accuracy of the text or exercises in this book. 
This book’s use or discussion of MATLAB® and Simulink® software or related products does not 
constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or 
particular use of the MATLAB® and Simulink® software. 

Copyright © 2005 by John Wiley & Sons, Inc. All rights reserved. 

Published by John Wiley & Sons, Inc., Hoboken, New Jersey. 

Published simultaneously in Canada. 

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any 
form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, 
except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without 
either the prior written permission of the Publisher, or authorization through payment of the 
appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, 
MA 01923, 978-750-8400, fax 978-646-8600, or on the web at www.copyright.com. Requests to 
the Publisher for permission should be addressed to the Permissions Department, John Wiley & 
Sons, Inc., Ill River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008. 

Limit of Liability/Disclaimer of Warranty: While the publisher and authors have used their best 
efforts in preparing this book, they make no representations or warranties with respect to the 
accuracy or completeness of the contents of this book and specifically disclaim any implied 
warranties of merchantability or fitness for a particular purpose. No warranty may be created or 
extended by sales representatives or written sales materials. The advice and strategies contained 
herein may not be suitable for your situation. You should consult with a professional where 
appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other 
commercial damages, including but not limited to special, incidental, consequential, or other 
damages. 

For general information on our other products and services please contact our Customer Care 
Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or 
fax 317-572-4002. 

Wiley also publishes its books in a variety of electronic formats. Some content that appears in 
print, however, may not be available in electronic format. 

Library of Congress Cataloging-in-Publication Data 
Yang, Won-young, 1953- 

Applied numerical methods using MATLAB® / Won Y. Yang, Wenwu Cao, Tae S. 

Chung, John Morris. 

Includes bibliographical references and index. 

ISBN 0-471-69833-4 (cloth) 

1. Numerical analysis-Data processing. 2. MATLAB. I. Cao, Wenwu. II. 

Chung, Tae-sang, 1952- III. Title. 

QA297.Y36 2005 
518-dc22 

2004013108 

Printed in the United States of America. 


10 98765432 






To our parents and families 
who love and support us 
and 

to our teachers and students 
who enriched our knowledge 



CONTENTS 


Preface xiii 

1 MATLAB Usage and Computational Errors 1 

1.1 Basic Operations of MATLAB / 1 

1.1.1 Input/Output of Data from MATLAB Command 
Window / 2 

1.1.2 Input/Output of Data Through Files / 2 

1.1.3 Input/Output of Data Using Keyboard / 4 

1.1.4 2-D Graphic Input/Output / 5 

1.1.5 3-D Graphic Output / 10 

1.1.6 Mathematical Functions / 10 

1.1.7 Operations on Vectors and Matrices / 15 

1.1.8 Random Number Generators / 22 

1.1.9 Flow Control / 24 

1.2 Computer Errors Versus Human Mistakes / 27 

1.2.1 TREE 64-bit Floating-Point Number Representation / 28 

1.2.2 Various Kinds of Computing Errors / 31 

1.2.3 Absolute/Relative Computing Errors / 33 

1.2.4 Error Propagation / 33 

1.2.5 Tips for Avoiding Large Errors / 34 

1.3 Toward Good Program / 37 

1.3.1 Nested Computing for Computational Efficiency / 37 

1.3.2 Vector Operation Versus Loop Iteration / 39 

1.3.3 Iterative Routine Versus Nested Routine / 40 

1.3.4 To Avoid Runtime Error / 40 

1.3.5 Parameter Sharing via Global Variables / 44 

1.3.6 Parameter Passing Through Varargin / 45 

1.3.7 Adaptive Input Argument List / 46 
Problems / 46 


vii 




CONTENTS 


2 System of Linear Equations 

2.1 Solution for a System of Linear Equations / 72 

2.1.1 The Nonsingular Case (M = N) / 72 

2.1.2 The Underdetermined Case ( M < N): Minimum-Norm 
Solution / 72 

2.1.3 The Overdetermined Case ( M > N): Least-Squares Error 
Solution / 75 

2.1.4 RLSE (Recursive Least-Squares Estimation) / 76 

2.2 Solving a System of Linear Equations / 79 

2.2.1 Gauss Eli mi nation / 79 

2.2.2 Partial Pivoting / 81 

2.2.3 Gauss-Jordan Elimination / 89 

2.3 Inverse Matrix / 92 

2.4 Decomposition (Factorization) / 92 

2.4.1 LU Decomposition (Factorization): 

Triangularization / 92 

2.4.2 Other Decomposition (Factorization): Cholesky, QR, 
and SVD / 97 

2.5 Iterative Methods to Solve Equations / 98 

2.5.1 Jacobi Iteration / 98 

2.5.2 Gauss-Seidel Iteration / 100 

2.5.3 The Convergence of Jacobi and Gauss-Seidel 
Iterations / 103 

Problems / 104 

3 Interpolation and Curve Fitting 

3.1 Interpolation by Lagrange Polynomial / 117 

3.2 Interpolation by Newton Polynomial / 119 

3.3 Approximation by Chebyshev Polynomial / 124 

3.4 Pade Approximation by Rational Function / 129 

3.5 Interpolation by Cubic Spline / 133 

3.6 Hermite Interpolating Polynomial / 139 

3.7 Two-dimensional Interpolation / 141 

3.8 Curve Fitting / 143 

3.8.1 Straight Line Fit: A Polynomial Function of First 
Degree / 144 

3.8.2 Polynomial Curve Fit: A Polynomial Function of Higher 
Degree / 145 

3.8.3 Exponential Curve Fit and Other Functions / 149 

3.9 Fourier Transform / 150 

3.9.1 FFT Versus DFT / 151 

3.9.2 Physical Meaning of DFT / 152 

3.9.3 Interpolation by Using DFS / 155 
Problems / 157 


CONTENTS 


4 Nonlinear Equations 179 

4.1 Iterative Method Toward Fixed Point / 179 

4.2 Bisection Method / 183 

4.3 False Position or Regula Falsi Method / 185 

4.4 Newton(-Raphson) Method / 186 

4.5 Secant Method / 189 

4.6 Newton Method for a System of Nonlinear Equations / 191 

4.7 Symbolic Solution for Equations / 193 

4.8 A Real-World Problem / 194 
Problems / 197 

5 Numerical Differentiation/Integration 209 

5.1 Difference Approximation for First Derivative / 209 

5.2 Approximation Error of First Derivative / 211 

5.3 Difference Approximation for Second and Higher 
Derivative / 216 

5.4 Interpolating Polynomial and Numerical Differential / 220 

5.5 Numerical Integration and Quadrature / 222 

5.6 Trapezoidal Method and Simpson Method / 226 

5.7 Recursive Rule and Romberg Integration / 228 

5.8 Adaptive Quadrature / 231 

5.9 Gauss Quadrature / 234 

5.9.1 Gauss-Legendre Integration / 235 

5.9.2 Gauss-Hermite Integration / 238 

5.9.3 Gauss-Laguerre Integration / 239 

5.9.4 Gauss-Chebyshev Integration / 240 

5.10 Double Integral / 241 

Problems / 244 

6 Ordinary Differential Equations 263 

6.1 Euler’s Method / 263 

6.2 Heun’s Method: Trapezoidal Method / 266 

6.3 Runge-Kutta Method / 267 

6.4 Predictor-Corrector Method / 269 

6.4.1 Adams-Bashforth-Moulton Method / 269 

6.4.2 Hamming Method / 273 

6.4.3 Comparison of Methods / 274 

6.5 Vector Differential Equations / 277 

6.5.1 State Equation / 277 

6.5.2 Discretization of LTI State Equation / 281 

6.5.3 High-Order Differential Equation to State Equation / 283 

6.5.4 Stiff Equation / 284 


CONTENTS 


6.6 Boundary Value Problem (BVP) / 287 

6.6.1 Shooting Method / 287 

6.6.2 Finite Difference Method / 290 
Problems / 293 

7 Optimization 321 

7.1 Unconstrained Optimization [L-2, Chapter 7] / 321 

7.1.1 Golden Search Method / 321 

7.1.2 Quadratic Approximation Method / 323 

7.1.3 Nelder-Mead Method [W-8] / 325 

7.1.4 Steepest Descent Method / 328 

7.1.5 Newton Method / 330 

7.1.6 Conjugate Gradient Method / 332 

7.1.7 Simulated Annealing Method [W-7] / 334 

7.1.8 Genetic Algorithm [W-7] / 338 

7.2 Constrained Optimization [L-2, Chapter 10] / 343 

7.2.1 Lagrange Multiplier Method / 343 

7.2.2 Penalty Function Method / 346 

7.3 MATLAB Built-In Routines for Optimization / 350 

7.3.1 Unconstrained Optimization / 350 

7.3.2 Constrained Optimization / 352 

7.3.3 Linear Programming (LP) / 355 
Problems / 357 

8 Matrices and Eigenvalues 371 

8.1 Eigenvalues and Eigenvectors / 371 

8.2 Similarity Transformation and Diagonalization / 373 

8.3 Power Method / 378 

8.3.1 Scaled Power Method / 378 

8.3.2 Inverse Power Method / 380 

8.3.3 Shifted Inverse Power Method / 380 

8.4 Jacobi Method / 381 

8.5 Physical Meaning of Eigenvalues/Eigenvectors / 385 

8.6 Eigenvalue Equations / 389 
Problems / 390 

9 Partial Differential Equations 401 

9.1 Elliptic PDE / 402 

9.2 Parabolic PDE / 406 

9.2.1 The Explicit Forward Euler Method / 406 

9.2.2 The Implicit Backward Euler Method / 407 


CONTENTS 


9.2.3 The Crank-Nicholson Method / 409 

9.2.4 Two-Dimensional Parabolic PDE / 412 

9.3 Hyperbolic PDE / 414 

9.3.1 The Explicit Central Difference Method / 415 

9.3.2 Two-Dimensional Hyperbolic PDE / 417 

9.4 Finite Element Method (FEM) for solving PDE / 420 

9.5 GUI of MATLAB for Solving PDEs: PDETOOL / 429 

9.5.1 Basic PDEs Solvable by PDETOOL / 430 

9.5.2 The Usage of PDETOOL / 431 

9.5.3 Examples of Using PDETOOL to Solve PDEs / 435 
Problems / 444 


Appendix A. 

Mean Value Theorem 

461 

Appendix B. 

Matrix Operations/Properties 

463 

Appendix C. 

Differentiation with Respect to a Vector 

471 

Appendix D. 

Laplace Transform 

473 

Appendix E. 

Fourier Transform 

475 

Appendix F. 

Useful Formulas 

477 

Appendix G. 

Symbolic Computation 

481 

Appendix H. 

Sparse Matrices 

489 

Appendix 1. 

MATLAB 

491 

References 


497 

Subject Index 


499 

Index for MATLAB Routines 

503 


Index for Tables 


509 


PREFACE 


This book introduces applied numerical methods for engineering and science 
students in sophomore to senior levels; it targets the students of today who do 
not like or do not have time to derive and prove mathematical results. It can 
also serve as a reference to MATLAB applications for professional engineers 
and scientists, since many of the MATLAB codes presented after introducing 
each algorithm’s basic ideas can easily be modified to solve similar problems 
even by those who do not know what is going on inside the MATLAB routines 
and the algorithms they use. Just as most drivers only have to know where to 
go and how to drive a car to get to their destinations, most users only have to 
know how to define the problems they want to solve using MATLAB and how 
to use the corresponding routines to solve their problems. We never deny that 
detailed knowledge about the algorithm (engine) of the program (car) is helpful 
for getting safely to the solution (destination); we only imply that one-time users 
of any MATLAB program or routine may use this book as well as the students 
who want to understand the underlying principle of each algorithm. 

In this book, we focus on understanding the fundamental mathematical con¬ 
cepts and mastering problem-solving skills using numerical methods with the 
help of MATLAB and skip some tedious derivations. Obviously, basic con¬ 
cepts must be taught so that students can properly formulate the mathematics 
problems. Afterwards, students can directly use the MATLAB codes to solve 
practical problems. Almost every algorithm introduced in this book is followed 
by example MATLAB code with a friendly interface so that students can easily 
modify the code to solve real life problems. The selection of exercises fol¬ 
lows the some philosophy of making the learning easy and practical. Students 
should be able to solve similar problems immediately after taking the class using 
the MATLAB codes we provide. For most students—and particularly nonmath 
majors—understanding how to use numerical tools correctly in solving their 
problems of interest is more important than studying lengthy proofs and deriva¬ 
tions. 

MATLAB is one of the most developed software packages available today. 
It provides many numerical methods and it is very easy to use, even for people 
without prior programming experience. We have supplemented MATLAB’s built- 
in functions with more than 100 small MATLAB routines. Readers should find 
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xiv 

these routines handy and useful. Some of these routines give better results for 
some problems than the built-in functions. Students are encouraged to develop 
their own routines following the examples. 

The knowledge in this book is derived from the work of many eminent sci¬ 
entists, scholars, researchers, and MATLAB developers, all of whom we thank. 
We thank our colleagues, students, relatives, and friends for their support and 
encouragement. We thank the reviewers, whose comments were so helpful in 
tuning this book. We especially thank Senior Researcher Yong-Suk Park for his 
invaluable help in correction. We thank the editorial and production staff of John 
Wiley & Sons, Inc. including Editor Val Moliere and Production Editor Lisa 
VanHom for their kind, efficient, and encouraging guide. 


October 2004 


Won Young Yang 
Wenwu Cao 
Tae-Sang Chung 
John Morris 




MATLAB USAGE AND 
COMPUTATIONAL ERRORS 


1.1 BASIC OPERATIONS OF MATLAB 

MATLAB is a high-level software package with many built-in functions that 
make the learning of numerical methods much easier and more interesting. In 
this section we will introduce some basic operations that will enable you to 
learn the software and build your own programs for problem solving. In the 
workstation environment, you type “matlab” to start the program, while in the 
PC environment, you simply double-click the MATLAB icon. 

Once you start the MATLAB program, a Command window will open with the 
MATLAB prompt ». On the command line, you can type MATLAB commands, 
functions together with their input/output arguments, and the names of script files 
containing a block of statements to be executed at a time or functions defined 
by users. The MATLAB program files must have the extension name * * *. m to 
be executed in the MATLAB environment. If you want to create a new M-file 
or edit an existing file, you click File/New/M-file or File/Open in the top left 
comer of the main menu, find/select/load the file by double-clicking it, and then 
begin editing it in the Editor window. If the path of the file you want to run 
is not listed in the MATLAB search path, the file name will not be recognized 
by MATLAB. In such cases, you need to add the path to the MATLAB-path 
list by clicking the menu ‘File/Set_Path’ in the Command window, clicking the 
‘Add_Folder’ button, browsing/clicking the folder name, and finally clicking the 
SAVE button and the Close button. The lookfor command is available to help 
you find the MATLAB commands/functions which are related with a job you 
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want to be done. The help command helps you know the usage of a particular 
command/function. You may type directly in the Command window 

»lookfor repeat or »help for 

to find the MATLAB commands in connection with ‘repeat’ or to obtain infor¬ 
mation about the “for loop”. 

1.1.1 Input/Output of Data from MATLAB Command Window 

MATLAB remembers all input data in a session (anything entered through direct 
keyboard input or running a script file) until the command ‘clear () ’ is given or 
you exit MATLAB. 

One of the many features of MATLAB is that it enables us to deal with the 
vectors/matrices in the same way as scalars. For instance, to input the matri¬ 
ces/vectors, 



type in the MATLAB Command window as below: 

»A = [1 2 3;4 5 6] 

A = 1 2 3 

4 5 6 

»B = [3; -2; 1 ]; %put the semicolon at the end of the statement to suppress 

the result printout onto the screen 
»C = [1 -2 3 -4] 

At the end of the statement, press <Enter> if you want to check the result 
of executing the statement immediately. Otherwise, type a semicolon before 
pressing <Enter> so that your window will not be overloaded by a long display 
of results. 

1.1.2 Input/Output of Data Through Files 

MATLAB can handle two types of data files. One is the binary format mat- 
files named ***.mat. This kind of file can preserve the values of more than one 
variable, but will be handled only in the MATLAB environment and cannot be 
shared with other programming environments. The other is the ASCII dat-files 
named ***.dat, which can be shared with other programming environments, but 
preserve the values of only one variable. 

Below are a few sample statements for storing some data into a mat-file in 
the current directory and reading the data back from the mat-file: 

»save ABC ABC %store the values of A,B,C into the file 'ABC.mat' 
»clear A C %clear the memory of MATLAB about A,C 
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»A %what is the value of A? 

??? Undefined function or variable 'A' 

»load ABC A C %read the values of A,C from the file 'ABC.mat' 

»A %the value of A 
A = 1 2 3 

4 5 6 

If you want to store the data into an ASCII dat-file (in the current directory), 
make the filename the same as the name of the data and type ‘/ ascii ’ at the 
end of the save statement. 

»save B.dat B /ascii 

However, with the save/load commands into/from a dat-file, the value of only 
one variable having the lowercase name can be saved/loaded, a scalar or a vec¬ 
tor/matrix. Besides, non-numeric data cannot be handled by using a dat-file. If 
you save a string data into a dat-file, its ASCII code will be saved. If a dat-file 
is constructed to have a data matrix in other environments than MATLAB, every 
line (row) of the file must have the same number of columns. If you want to read 
the data from the dat-file in MATLAB, just type the (lowercase) filename ***.dat 
after ‘load’, which will also be recognized as the name of the data contained in 
the dat-file. 

»load b.dat %read the value of variable b from the ascii file 'b.dat' 

On the MATLAB command line, you can type ‘nml 12’ to run the following 
M-file ‘nm112.m’ consisting of several file input(save)/output(load) statements. 
Then you will see the effects of the individual statements from the running 
results appearing on the screen. 


%nm112.m 

clear 

A = [1 2 3;4 5 6] 

B = [3;-2;1]; 

C(2) = 2; C(4) = 4 

disp('Press any key to see the input/output through Files') 
save ABC ABC %save A,B & C as a MAT-file named 'ABC.mat 1 
clear('A','C') %remove the memory about A and C 
load ABC A C %read MAT-file to recollect the memory about A and C 
save B.dat B /ascii %save B as an ASCll-file named 'b.dat' 
clear B 

load b.dat %read ASCll-file to recollect the memory about b 
b 

x = input('Enter x:') 
format short e 

format rat, x 
format long, x 
format short, x 
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1.1.3 Input/Output of Data Using Keyboard 

The command ‘input’ enables the user to input some data via the keyboard. 
For example, 

»x = input('Enter x: ') 

Enter x: 1/3 
x = 0.3333 

Note that the fraction 1/3 is a nonterminating decimal number, but only four 
digits after the decimal point are displayed as the result of executing the above 
command. This is a choice of formatting in MATLAB. One may choose to 
display more decimal places by using the command ‘format’, which can make 
a fraction show up as a fraction, as a decimal number with more digits, or even 
in an exponential form of a normalized number times 10 to the power of some 
integer. For instance: 

»format rat %as a rational number 
»x 

x = 1/3 

»format long %as a decimal number with 14 digits 
x = 0.33333333333333 

»format long e %as a long exponential form 
»x 

x = 3.333333333333333e-001 

»format hex %as a hexadecimal form as represented/stored in memory 
»x 

x = 3fd5555555555555 

»format short e %as a short exponential form 
x = 3.3333e-001 

»format short %back to a short form (default) 

»x 

x = 0.3333 

Note that the number of displayed digits is not the actual number of significant 
digits of the value stored in computer memory. This point will be made clear in 
Section 1.2.1. 

There are other ways of displaying the value of a variable and a string on the 
screen than typing the name of the variable. Two useful commands are ‘disp() ’ 
and ‘fprintf()’. The former displays the value of a variable or a string without 
‘x = ’ or ‘ans = the latter displays the values of several variables in a specified 
format and with explanatory/cosmetic strings. For example: 

»disp('The value of x = '),disp(x) 

%disp('string_to_display' or variable_name) 

The value of x = 0.3333 

Table 1.1 summarizes the type specifiers and special characters that are used in 
‘fprintf ()’ statements. 

Below is a program that uses the command ‘input’ so that the user could 
input some data via the keyboard. If we run the program, it gets a value of the 
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Table 1.1 Type Specifiers and Special Characters Used in fprintf () Statements 


Type 

Specifier 

Printing Form: 

fprintf(‘**format string**’, variables_to_be_printed,..) 

Special 

Character 

Meaning 

%c 

Character type 

\ n 

New line 

%s 

String type 

\t 

Tab 

%d 

Decimal integer number type 

\b 

Backspace 

%f 

Floating point number type 

\r 

CR return 

%e 

Decimal exponential type 

\f 

Form feed 

%x 

Hexadecimal integer number 

%% 

% 

%bx 

Floating number in 16 hexadecimal digits(64 bits) 




temperature in Fahrenheit [°F] via the keyboard from the user, converts it into 
the temperature in Centigrade [°C] and then prints the results with some remarks 
both onto the screen and into a data file named ‘nmll3.dat’. 


%nm113.m 

f = input('Input the temperature in Fahrenheit[F]: 1 ); 
c = 5/9*(f-32); 

fprintf('%5.2f(in Fahrenheit) is %5.2f(in Centigrade).\n 1 ,f,c) 
fid=fopen('nm113.dat', 'w'); 

fprintf(fid, '%5.2f(Fahrenheit) is %5.2f(Centigrade).\n 1 ,f,c); 
fclose(fid); 


In case you want the keyboard input to be recognized as a string, you should 
add the character 's' as the second input argument. 

»ans = input('Answer <yes> or <no>: ','s') 

1.1.4 2-D Graphic Input/Output 

How do we plot the value(s) of a vector or an array? Suppose that data reflecting 
the highest/lowest temperatures for 5 days are stored as a 5 x 2 array in an ASCII 
file named ‘temp.dat’. 

The job of the MATLAB program “nml 141. m” is to plot these data. Running 
the program yields the graph shown in Fig. 1.1a. Note that the first line is a 
comment about the name and the functional objective of the program(file), and 
the fourth and fifth lines are auxiliary statements that designate the graph title 
and units of the vertical/horizontal axis; only the second & third lines are indis¬ 
pensable in drawing the colored graph. We need only a few MATLAB statements 
for this artwork, which shows the power of MATLAB. 


%nm114_1: plot the data of a 5x2 array stored in "temp.dat" 
load temp.dat 

elf, plot(temp) %clear any existent figure and plot 
title('the highest/lowest temperature of these days') 
ylabel('degrees[C]'), xlabel('day') 
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The highest/lowest temperature of days 



The highest/lowest temperature of days 



(b) Domain of the horizontal 
variable specified 


Figure 1.1 Plot of a 5 x 2 matrix data representing the highest/lowest temperature. 


Here are several things to keep in mind. 

• The command plot () reads along the columns of the 5x2 array data given 
as its input argument and recognizes each column as the value of a vector. 

• MATLAB assumes the domain of the horizontal variable to be [1 2 .. 5] by 
default, where 5 equals the length of the vector to be plotted (see Fig. 1.1a). 

• The graph is constructed by connecting the data points with the straight lines 
and is piecewise-linear, while it looks like a curve as the data points are 
densely collected. Note that the graph can be plotted as points in various 
forms according to the optional input argument described in Table 1.2. 

(Ql) Suppose the data in the array named ‘temp’ are the highest/lowest temperatures 
measured on the 11th,12th,14th,16th, and 17th days, respectively. How should we 
modify the above program to have the actual days shown on the horizontal axis? 

(Al) Just make the day vector [11 12 14 16 17] and use it as the first input argument 
of the plot () command. 

»days = [11 12 14 16 17] 

»plot(days,temp) 

Executing these statements, we obtain the graph in Fig. 1.1b. 

(Q2) What statements should be added to change the ranges of the horizontal/vertical 
axes into 10-20 and 0-30, respectively, and draw the grid on the graph? 


Table 1.2 Graphic Line Specifications Used in the plot () Command 


Line Type 

Point Type (Marker Symbol) 

Color 

- solid line 

: dotted line 

- - dashed line 
-. dash-dot 

. (dot) 

p : ☆ 

d : 0 

+ (plus) 

> : > 

v : V 

< : < 

* (asterisk) 
o (circle) 
x : x-mark 
s : □ 

r : red 
g : green 

k : black 

m : magenta 
y : yellow 
c : cyan (sky blue) 
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(A2) »axis([10 20 0 30]), grid on 
»plot(days,temp) 

(Q3) How do we make the scales of the horizontal/vertical axes equal so that a circle 
appears round, not like an ellipse? 

(A3) »axis( 1 equal') 

(Q4) How do we have another graph overlapped onto an existing graph? 

(A4) If you use the ‘hold on’ command after plotting the first graph, any following 
graphs in the same section will be overlapped onto the existing one(s) rather 
than plotted newly. For example: 

»hold on, plot(days,temp(:,1), 1 b* 1 , days,temp(:,2),'ro 1 ) 

This will be good until you issue the command ‘hold off’ or clear all the graphs 
in the graphic window by using the ‘elf ’ command. 

Sometimes we need to see the interrelationship between two variables. Sup¬ 
pose we want to plot the lowest/highest temperature, respectively, along the 
horizontal/vertical axis in order to grasp the relationship between them. Let us 
try using the following command: 

»plot(temp(:, 1) ,temp(: ,2), 1 kx 1 ) % temp (:, 2) vs. temp(:,1) in black 'x' 

This will produce a pointwise graph, which is fine. But, if you replace the third 
input argument by ‘b: ’ or just omit it to draw a piecewise-linear graph connecting 
the data points as Fig. 1.2a, the graphic result looks clumsy, because the data on 
the horizontal axis are not arranged in ascending or descending order. The graph 
will look better if you sort the data on the horizontal axis and also the data on 
the vertical axis accordingly and then plot the relationship in the piecewise-linear 
style by typing the MATLAB commands as follows: 

»[temp1,I] = sort(temp(:,1)); temp2 = temp(I,2); 

»plot(templ ,temp2) 

The graph obtained by using these commands is shown in Fig. 1.2b, which looks 
more informative than Fig. 1.2a. 



(a) Data not arranged (b) Data arranged along the horizontal axis. 


Figure 1.2 Examples of graphs obtained using the plot () command. 
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We can also use the plot () command to draw a circle. 

»r = 1; th = [0:0.01:2]*pi; % [0:0.01:2] makes [0 0.01 0.02 .. 2] 
»plot (r*cos(th), r*sin(th)) 

»plot (r*exp(j *th)) %alternatively, 

Note that the plot () command with a sequence of complex numbers as its first 
input argument plots the real/imaginary parts along the horizontal/vertical axis. 

The polar () command plots the phase (in radians)/magnitude given as its 
first/second input argument, respectively (see Fig. 1.3a). 

»polar(th,exp(-th)) %polar plot of a spiral 

Several other plotting commands, such as semilogx(), semilogy(), loglogO, 
stairsf), stem(), bar()/barh( ), and hist(), may be used to draw various 
graphs (shown in Figs.1.3 and 1.4). Readers may use the ‘help’ command to get 
the detailed usage of each one and try running the following MATLAB program 
‘nmll4_2.m\ 


%nm114_2: plot several types of graph 
th = [0: .02:1]*pi; 
subplot(22l), polar(th,exp(-th)) 
subplot(222), semilogx(exp(th)) 
subplot(223), semilogy(exp(th)) 
subplot(224), loglog(exp(th)) 
pause, elf 

subplot(221), stairs([1 3 2 0]) 
subplot(222), stem([l 3 2 0]) 
subplot(223), bar([2 3; 45]) 
subplot(224), barh([2 3; 4 5]) 
pause, elf 

y = [0.3 0.9 1.6 2.7 3 2.4]; 

subplot(221), hist(y,3) 

subplot(222), hist(y,0.5 + [0 1 2]) 


Moreover, the commands sprintf (), text(), and gtext() are used for com¬ 
bining supplementary statements with the value(s) of one or more variables to 
construct a string and printing it at a certain location on the existing graph. 
For instance, let us try the following statements in the MATLAB Command 
window: 

»f = 1. / [ 1:10]; plot(f) 

»n = 3; [s.errmsg] = sprintf ('f (%1d) = %5.2f' ,n,f (n)) 

»text(3,f (3) ,s) %writes the text string at the point (3,f (3)) 

»gtext('f(x) = 1 /x 1 ) %writes the input string at point clicked by mouse 

The command ginput() allows you to obtain the coordinates of a point 
by clicking the mouse button on the existent graph. Let us try the following 
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Figure 1.4 Graphs drawn by various graphic commands. 
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commands: 

»[x,y,butkey] = ginput %get the x,y coordinates & # of the mouse button 
or ascii code of the key pressed till pressing the ENTER key 
»[x,y,butkey] = ginput(n) %repeat the same job for up to n points clicked 

1.1.5 3-D Graphic Output 

MATLAB has several 3-D graphic plotting commands such as plot3 (), mesh (), 
and contour(). plot3() plots a 2-D valued-function of a scalar-valued vari¬ 
able; mesh()/contour() plots a scalar valued-function of a 2-D variable in a 
mesh/contour-like style, respectively. 

Readers are recommended to use the help command for detailed usage of each 
command. Try running the MATLAB program ‘nmll5.m’ to see what figures 
will appear (Figs. 1.5 and 1.6). 


%nm115: to plot 3D graphs 

t = 0:pi/50:6*pi; 

expt = exp(-0.1*t); 

xt = expt.*cos(t); yt = expt.*sin(t); 

%dividing the screen into 2x2 sections 
subplot(221), plot3(xt, yt, t), grid on %helix 
subplot(222), plot3(xt, yt, t), grid on, view([0 01]) 
subplot(223), plot3(t, xt, yt), grid on, view([1 -3 1]) 
subplot(224), plot3(t, yt, xt), grid on, view([0 -3 0]) 
pause, elf 

x = -2:.1:2; y = -2:.1:2; 

[X,Y] = meshgrid(x,y); Z = X."2 + Y."2; 

subplot(221), mesh(X,Y,Z), grid on %[azimuth,elevation] = [-37.5,30] 

subplot(222), mesh(X,Y,Z), view([0,20]), grid on 

pause, view([30,30]) 

subplot(223), contour(X,Y,Z) 

subplot(224), contour(X,Y,Z,[.5,2,4.5]) 


1.1.6 Mathematical Functions 

Mathematical functions and special reserved constants/variables defined in MAT¬ 
LAB are listed in Table 1.3. 

MATLAB also allows us to define our own function and store it in a file 
named after the function name so that it can be used as if it were a built-in 
function. For instance, we can define a scalar-valued function: 

fi(x) = 1/(1+ 8* 2 ) 

and a vector-valued function 

f M _ r/i(*i.* 2 )l _ r x* + 4x%-5 ] 

M ) [ fi(x \, x 2 ) \ [ 2 xj - 2 X ] - 3x 2 ~ 2-5 J 
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(a) plot3(cos(t), sin(t), t) (b) plot3(), view([0 0 1 ]) 



° 10 20 1 % 5 10 15 20 

(c) plot3(), view [(1 -3 1)] (d) plot3(), view ([0 -3 0]) 

Figure 1.5 Graphs drawn by the plot3 () command with different views. 



(c) contour(X,Y,Z) (d) contour(X,Y,Z, [0.5, 2, 4.5]) 


Figure 1.6 Graphs drawn by the mesh () and contour]) commands. 


as follows. 
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Table 1.3 Functions and Variables Inside MATLAB 

Function 

Remark 

Function 

Remark 

COS(X, 


exp(x) 

Exponential functioi 

sin(x) 


log(x) 

Natural logarithm 

tan,x, 


loglO(x) 

Common logarithm 

acos(x) 

cos—Hx) 

abs,x) 

Absolute value 


sin-'(x) 

angle(x) 

Phase of a complex 
number [rad] 

atan(x) 

-jr/2 < tan '(*) < jt/2 

sqrt(x) 

Square root 

atan2(y,x) 

-it < tan < it 

real(x) 

Real part 

cosh(x) 

(e* + e-*)/2 

imag(x) 

Imaginary part 

sinh(x) 

(e* - e~*)/2 

c° n i( x ) 

Complex conjugate 

tanh(x) 

(e* - e~*)/(e* + e ') 

round(x) 

The nearest integer 
(round-off) 

acosh(x) 

cosh -1 (x) 

fix(x) 

The nearest integer 
toward 0 

asinh(x) 

sinh -1 (x) 

floor(x) 

The greatest integer 

atanh(x) 

tanh _1 (x) 

ceil(x) 

The smallest iniesret 

max 

Maximum and its index 


l(positive)/0/- 
1 (negative) 

min 

Minimum and its index 

mod(y,x) 

Remainder of y/x 

sum 

Sum 

rem(y,x) 

Remainder of y/x 

orod 

Product 

eval(f) 

Evaluate an exnress: 
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Table 1.3 ( continued) 


find 

Index of element(s) 

roots 

Roots of polynomial 

flops(0) 

Reset the flops count to 

tio 

Start a stopwatch timer 

flops 

Cumulative # of floating 
point operations 
(unavailable in 

MATLAB 6.x) 


Read the stopwatch 
timer (elapsed time 
from tic) 

date 

Present date 

magic 

Magic square 

Reserved Variables with Special Meaning j 


V— I 

Pi 

- 

eps 

Machine epsilon floating 
point relative accuracy 

realmax realmin 

Largest/smallest 
positive number 

break 

Exit while/for loop 

Inf, inf 

Largest number (oo) 

end 

The end of for-loop or 
if, while, case statement 
or an array index 

NaN 

Not_a_Number 

(undetermined) 

nargin 

Number of input 
arguments 

nargout 

Number of output 
arguments 

varargin 

Variable input argument 
list 

varargout 

Variable output 
argument list 


Once we store these functions into the files named ‘fl.m’ and ‘f49.m’ after the 
function names, respectively, we can call and use them as needed inside another 
M-file or in the MATLAB Command window. 

»fl([0 1]) %several values of a scalar function of a scalar variable 
ans = 1.0000 0.1111 

»f49([0 1 ]) %a value of a 2-D vector function of a vector variable 
ans = -1.0000 -5.5000 

»feval('fl',[0 1]), feval('f49',[0 1]) %equivalently, yields the same 
ans = 1.0000 0.1111 

ans = -1.0000 -5.5000 

(Q5) With the function fl(x) defined as a scalar function of a scalar variable, we enter 
a vector as its input argument to obtain a seemingly vector-valued output. What’s 
going on? 
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(A5) It is just a set of function values [fl(xl) fl(x2)...] obtained at a time for several 
values [xl x2...] of x. In expectation of one-shot multi-operation, it is a good 
practice to put a dot(.) just before the arithmetic operators * (multiplication), 
/(division), and " (power) in the function definition so that the term-by-term 
(termwise) operation can be done any time. 

Note that we can define a simple function not only in an independent M-file, 
but also inside a program by using the inline () command or just in a form of 
literal expression that can be evaluated by the command eval(). 

»f1 = inline (' 1. / (1 +8*x. * 2 } ' , ' x 1 ); 

»f 1 ([0 1]), feval(f 1, [0 1]) 
ans = 1.0000 0.1111 

ans = 1.0000 0.1111 

»f1 = '1,/(1+8*x. A 2) 1 ; x = [0 1]; eval(fl) 
ans = 1.0000 0.1111 

As far as a polynomial function is concerned, it can simply be defined as its 
coefficient vector arranged in descending order. It may be called to yield its 
value for certain value(s) of its independent variable by using the command 
polyval(). 

»p =[10-3 2]; %polynomial function p(x) = x 3 - 3x +2 
»polyval(p, [01]) 

ans = 2.0000 0.0000 

The multiplication of two polynomials can be performed by taking the con¬ 
volution of their coefficient vectors representing the polynomials in MATLAB, 
since 

(a^x N + • • • + a\x + ao)(bffX N + • • • + b\x + bo) = C2 nx 2n + ■ ■ ■ + c\x + Co 
where 


c k = a k - m b m for k = 2N,2N - 1,..., 1,0 

m=max(0,i—JV) 

This operation can be done by using the MATLAB built-in command conv() as 
illustrated below. 

»a = [1 -If; b= [ 1 1 1]; c = conv(a,b) 

c = 1 0 0-1 %meaning that (x - l)(x 2 + x + l)=x 3 +0-x 2 +0-x-l 

But, in case you want to multiply a polynomial by only x' 1 , you can simply 
append n zeros to the right end of the polynomial coefficient vector to extend 
its dimension. 

»a = [1 2 3]; c = [a 0 0] %equivalently, c = conv(a,[1 0 0]) 

c = 1 2 3 0 0 %meaning that (x 2 + 2x + 3)x 2 = x 4 + 2x 3 + 3x 2 + 0 • x + 0 
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1.1.7 Operations on Vectors and Matrices 

We can define a new scalar/vector/matrix or redefine any existing ones in terms 
of the existent ones or irrespective of them. In the MATLAB Command window, 
let us defined and B as 



by typing 

»A = [1 23;456], B = [3;-2; 1 ] 

We can modify them or take a portion of them. For example: 

»A = [A;7 8 9] 

A = 1 2 3 

4 5 6 

7 8 9 

»B = [B [1 0 -1 ] ' ] 

B = 3 1 

-2 0 

Here, the apostrophe (prime) operator ( 1 ) takes the complex conjugate transpose 
and functions virtually as a transpose operator for real-valued matrices. If you 
want to take just the transpose of a complex-valued matrix, you should put a 
dot(.) before 1 , that is, ‘. 1 ’. 

When extending an existing matrix or defining another one based on it, the 
compatibility of dimensions should be observed. For instance, if you try to annex 
a 4 x 1 matrix into the 3x1 matrix B, MATLAB will reject it squarely, giving 
you an error message. 

»B = [B ones(4,1)] 

???A11 matrices on a row in the bracketed expression must have 
the same number of rows 

We can modify or refer to a portion of a given matrix. 

»A(3,3) = 0 
A = 1 2 3 

4 5 6 

7 8 0 

»A(2:3,1:2) %from 2 nd row to 3 rd row, from 1 st column to 2 nd column 
ans = 4 5 

7 8 

»A(2,:) %2 nd row, all columns 
ans =4 5 6 

The colon (:) is used for defining an arithmetic (equal difference) sequence 
without the bracket [ ] as 


»t = 0:0.1 :2 
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t = [0.00.1 0.2 ... 1.92.0] 

(Q6) What if we omit the increment between the left/right boundary numbers? 

(A6) By default, the increment is 1. 

»t = 0:2 
t = 0 1 2 

(Q7) What if the right boundary number is smaller/greater than the left boundary 
number with a positive/negative increment? 

(A7) It yields an empty matrix, which is useless. 

»t = 0: -2 

t = Empty matrix: 1-by-0 

(Q8) If we define just some elements of a vector not fully, but sporadically, will we 
have a row vector or a column vector and how will it be filled in between? 
(A8) We will have a row vector filled with zeros between the defined elements. 

»D(2) = 2; D(4) = 3 
D = 0 2 0 3 

(Q9) How do we make a column vector in the same style? 

(A9) We must initialize it as a (zero-filled) row vector, prior to giving it a value. 

»D = zeros(4,1) ; D(2) = 2; D(4) = 3 
D = 0 

2 

0 

3 

(Q10) What happens if the specified element index of an array exceeds the defined 
range? 

(A10) It is rejected. MATLAB does not accept nonpositive or noninteger indices. 
»D(5) 

??? Index exceeds matrix dimensions. 

»D(0) = 1; 

??? Index into matrix is negative or zero. 

»D( 1.2) 

??? Subscript indices must either be real positive 
integers .. 

(Qll) How do we know the size (the numbers of rows/columns) of an already- 
defined array? 
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(All) Use the length() and size() commands as indicated below. 

»length(D) 
ans = 4 

»[M,N] = size(A) 

M = 3 
N = 3 

MATLAB enables us to handle vector/matrix operations in almost the same 
way as scalar operations. However, we must make sure of the dimensional com¬ 
patibility between vectors/matrices, and we must put a dot (.) in front of the 
operator for termwise (element-by-element) operations. The addition of a matrix 
and a scalar adds the scalar to every element of the matrix. The multiplication 
of a matrix by a scalar multiplies every element of the matrix by the scalar. 

There are several things to know about the matrix division and inversion. 

Remark 1.1. Rules of Vector/Matrix Operation 

1. For a matrix to be invertible, it must be square and nonsingular; that is, the 
numbers of its rows and columns must be equal and its determinant must 
not be zero. 

2. The MATLAB command pinv (A) provides us with a matrix X of the same 
dimension as A T such that AX A = A and XAX = X. We can use this 
command to get the right/left pseudo- (generalized) inverse A T [AA T ]~ l / 
[A r A\~ l A r for a matrix A given as its input argument, depending on 
whether the number ( M ) of rows is smaller or greater than the number 
(N) of columns, so long as the matrix is of full rank; that is, rank (A) = 
min(M, A)[K-1, Section 6.4], Note that A T [AA T ]~ l /[A r A]- 1 A r is called 
the right/left inverse because it is multiplied onto the right/left side of A 
to yield an identity matrix. 

3. You should be careful when using the pinv (A) command for a rank- 
deficient matrix, because its output is no longer the right/left inverse, which 
does not even exist for rank-deficient matrices. 

4. The value of a scalar function having an array value as its argument is also 
an array with the same dimension. 

Suppose we have defined vectors a i, ^ 2 , b\, (>2 and matrices A t , A 2 , B as follows: 
»a1 = [-1 2 3]; a2 = [4 5 2]; bl = [1 -3]'; b2 = [-2 0]; 

ai = [-l 2 3], a 2 = [4 5 2], h, = *2 = [ — 1 2 3] 

»A1 = [al ;a2], A2 = [al;[b2 1]], B = [bl b2' ] 
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The results of various operations on these vectors/matrices are as follows (pay 
attention to the error message): 

»A3 = A1 + A2, A4 = A1 - A2, 1 + A1 %matrix/scalar addition/subtraction 
A3 = -2 4 6 A4 = 0 0 0 ans = 0 3 4 

253 651 563 

»AB = A1*B % AB(m,n ) = ^ Ai(m, k)B(k, n) matrix multiplication? 

??? Error using ==> * 

Inner matrix dimensions must agree. 

»BA1 = B*A1 % regular matrix multiplication 
BA1 = -9 -8 -1 

3 -6 -9 

»AA = A1.*A2 %termwise multiplication 
AA = 1 4 9 

-8 0 2 

»AB=A1. *B % AB(m, n) = n)B(m, n) termwise multiplication 
??? Error using ==> .* 

Matrix dimensions must agree. 

»A1_1 = pinv(A1) ,A1 1 *(A1 *A1 1 ) "-I,eye(size(A1,2)) /A1 %A[[A 1 A[]-' 

A1_1 = -0.1914 0.1399 %right inverse of a 2 x 3 matrix A1 

0.0617 0.0947 

0.2284 -0.0165 

»A1 *A1_1 %A1/A1 = I implies the validity of A1_1 as the right inverse 
ans = 1.0000 0.0000 

0.0000 1.0000 

»A5 = A1 1 ; % a 3 x 2 matrix 

»A5_1 = pinv(A5), (A5 ' *A5) " -1 *A5 1 , A5\eye(Size (A5,1)) % [A t s A 5 ]-'A t 5 

A5_1 = -0.1914 0.0617 0.2284 %left inverse of a 3x2 matrix A5 

0.1399 0.0947 -0.0165 

»A5_1 *A5 % = I implies the validity of A5_1 as the left inverse 
ans = 1.0000 -0.0000 

-0.0000 1.0000 

»A1_li = (A1 1 *A1) ~ -1 *A1 1 %the left inverse of matrix A1 with M < N? 
Warning: Matrix is close to singular or badly scaled. 

Results may be inaccurate. RCOND = 9.804831e-018. 

A1_li = -0.2500 0.2500 

0.2500 0 

0.5000 0.5000 

(Q12) Does the left inverse of a matrix having rows fewer than columns exist? 

(A12) No. There is no N x M matrix that is premultiplied on the left of an M x N 
matrix with M < N to yield a nonsingular matrix, far from an identity matrix. 
In this context, MATLAB should have rejected the above case on the ground 
that [ A\ Ai] is singular and so its inverse does not exist. But, because the round¬ 
off errors make a very small number appear to be a zero or make a real zero 
appear to be a very small number (as will be mentioned in Remark 2.3), it is 
not easy for MATLAB to tell a near-singularity from a real singularity. That is 
why MATLAB dares not to declare the singularity case and instead issues just a 
warning message to remind you to check the validity of the result so that it will 
not be blamed for a delusion. Therefore, you must be alert for the condition 
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mentioned in item 2 of Remark 1.1, which says that, in order for the left inverse 
to exist, the number of rows must not be less than the number of columns. 

»A1_li*A1 %No identity matrix, since A1_li isn't the left inverse 
ans = 1 .2500 0.7500 -0.2500 

-0.2500 0.5000 0.7500 

1.5000 3.5000 2.5000 

»det(A1'*A1) %A1 is not left-invertible for A1 1 *A1 is singular 


(cf) Let us be nice to MATLAB as it is to us. From the standpoint of promoting mutual 
understanding between us and MATLAB, we acknowledge that MATLAB tries to 
show us apparently good results to please us like always, sometimes even pretending 
not to be obsessed by the demon of ‘ill-condition’ in order not to make us feel uneasy. 
How kind MATLAB is! But, we should be always careful not to be spoiled by its 
benevolence and not to accept the computing results every inch as it is. In this case, 
even though the matrix [A1 ' *A1 ] is singular and so not invertible, MATLAB tried 
to invert it and that’s all. MATLAB must have felt something abnormal as can be 
seen from the ominous warning message prior to the computing result. Who would 
blame MATLAB for being so thoughtful and loyal to us? We might well be rather 
touched by its sincerity and smartness. 

In the above statements, we see the slash(/)/backslash(\) operators. These oper¬ 
ators are used for right/left division, respectively; B / A is the same as B * i n v (A ) and 
A\B is the same as inv(A) *B when A is invertible and the dimensions of A and B 
are compatible. Noting that B/A is equivalent to (A 1 \B 1 ) 1 , let us take a close look 
at the function of the backslash(\) operator. 

»X = A1\A1 % an identity matrix? 

X = 1.0000 0 -0.8462 

0 1.0000 1.0769 

0 0 0 

(Q13) It seems that A1\A1 should have been an identity matrix, but it is not, contrary 
to our expectation. Why? 

(A13) We should know more about the various functions of the backslash(\), which 
can be seen by typing ‘help slash’ into the MATLAB Command window. Let 
Remark 1.2 answer this question in cooperation with the next case. 

»A1 *X - A1 %zero if X is the solution to A1*X = A1? 

ans = 1.Oe-015 * 0 0 0 

0 0 -0.4441 

Remark 1.2. The Function of Backslash (\) Operator. Overall, for the command 
‘A\B’, MATLAB finds a solution to the equation A*X = B. Let us denote the 
row/column dimension of the matrix A by M and N. 

1. If matrix A is square and upper/lower-triangular in the sense that all of 
its elements below/above the diagonal are zero, then MATLAB finds the 
solution by applying backward/forward substitution method (Section 2.2.1). 
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2. If matrix A is square, symmetric (Hermitian), and positive definite, then 
MATLAB finds the solution by using Cholesky factorization (Section 2.4.2). 

3. If matrix A is square and has no special feature, then MATLAB finds the 
solution by using LU decomposition (Section 2.4.1). 

4. If matrix A is rectangular, then MATLAB finds a solution by using QR 
factorization (Section 2.4.2). In case A is rectangular and of full rank with 
rank(A) = min(M,N), it will be the LS (least-squares) solution [Eq. (2.1.10)] 
for M > N (overdetermined case) and one of the many solutions that is not 
always the same as the minimum-norm solution [Eq. (2.1.7)] for M < N 
(underdetermined case). But for the case when A is rectangular and has 
rank deficiency, what MATLAB gives us may be useless. Therefore, you 
must pay attention to the warning message about rank deficiency, which 
might tell you not to count on the dead-end solution made by the backslash 
(\) operator. To find an alternative in the case of rank deficiency, you 
had better resort to singular value decomposition (SVD). See Problem 2.8 
for details. 

For the moment, let us continue to try more operations on matrices. 

»A1./A2 %termwise right division 
-2 Inf 2 

»A1.\A2 %termwise left division 
ans =1 1 1 

-0.5 0 0.5 

»format rat, B~-1 %represent the numbers (of B -1 ) in fractional form 
ans = 0 -1/3 

- 1/2 - 1/6 

»inv(B) %inverse matrix, equivalently 
ans = 0 -1/3 

- 1/2 - 1/6 

»B. ~-1 %termwise inversion(reciprocal of each element) 
ans = 1 -1/2 

-1/3 Inf 

»B~2 %square of B, i.e., B 2 = B*B 
ans =7 -2 

-3 6 

»B.~2 %termwise square(square of each element) 
ans = ^(b 2 n ) 4 (b 2 2 ) 

9(6 2 2 j) 0 (bl 2 ) 

»2.~B %2 to the power of each number in B 
ans = 2 ( 2 b «) 1/4(2*“) 

1/8(2 fe ) 1 (2 fc22 ) 

»A1.~A2 %element of A1 to the power of each element in A2 
ans = -1 (A 1 (l,l)' l2(1 ' 1) ) 4(A 1 (1,2)' 42<1 ' 2 >) 27^(1,3) A2<1 - 3) ) 

1 /16LM2, 1 ) 42 ( 2 . 1 )) ! (A l( 2, 2 ) 42 ( 2 . 2 )) 2(A 1 (2, 3)^< 2 ' 3 >) 

»format short, exp(B) %elements of e B with 4 digits below the dp 
ans = 2.7183(e fc ") 0.1353(e il2 ) 

0.0498(e &21 ) 1.0000(e 622 ) 

There are more useful MATLAB commands worthwhile to learn by heart. 
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Remark 1.3. More Useful Commands for Vector/Matrix Operations 

1. We can use the commands zeros(), ones(), and eye() to construct a 
matrix of specified size or the same size as an existing matrix which has 
only zeros, only ones, or only ones/zeros on/off its diagonal. 

»Z = zeros(2,3) %or zeros(size(A1)) yielding a 2 x 3 zero matrix 
Z = 0 0 0 

0 0 0 

»E = ones(size(B)) %or ones(3,2) yielding a 3 x 2 one matrix 


»I = eye(2) %yielding a 2 x 2 identity matrix 
1 = 1 0 

0 1 

2. We can use the diag() command to make a column vector composed 
of the diagonal elements of a matrix or to make a diagonal matrix with 
on-diagonal elements taken from a vector given as the input argument. 

»A1, diag(AI) %column vector consisting of diagonal elements 
A1 = -1 2 3 

4 5 2 

ans = -1 

5 

3. We can use the commands sum()/prod() to get the sum/product of ele¬ 
ments in a vector or a matrix, columnwisely first (along the first non¬ 
singleton dimension). 

»sa1 = sum(al) %sum of all the elements in vector ai 
sal = 4 %£ ai(n) I • 2 1 3 — 4 

»sA1 = sum(AI) %sum of all the elements in each column of matrix Ai 
sAI = 3 7 5 tatl(n) = 1 M(m,n) = [- 1 + 4 2 + 5 3 + 2] 

»SA1 = sum(sum(A1)) %sum of all elements in matrix Ai 

SAI = 15 %SA1 = Y? n = ,E! = j Ai(m,n ) = 3 + 7 + 5 = 15 
»pa1 = prod(al) %product of all the elements in vector ai 
pal = 4 %f[ ai(n ) = 1) x 2 x 3 = - 6 

»pA1=product(A1) %product of all the elements in each column of matrix Ai 
pAI = -4 10 6 %pAl(n) = fl" = 1 = [-1 x 4 2x 5 3 x 2] 

»PA1 = product(product(A1)) %product of all the elements of matrix Ai 
PAI = -240 %PA1 = n"=i n"= i Ai(m,n) = ( - 4) x 10 x 6 = - 240 

4. We can use the commands max () / min () to find the first maximum/minimum 
number and its index in a vector or in a matrix given as the input argument. 
»[aM,iM] = max(a2) 

aM = 5, iM = 2 %means that the max. element of vector a2 is a2(2) = 5 
»[AM, IM] = max(AI) 

AM = 4 5 3 

IM = 2 2 1 

%means that the max. elements of each column of AI are 
AI (2,1) = 4, AI (2,2) = 5, AI (1 ,3) = 3 
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»[AMx, J] = max (AM) 

AMx = 5, J = 2 

%implies that the max. element of A1 is A1(IM(J),J) = A1(2,2) = 5 

5. We can use the commands not90()/flipln()/flipud() to rotate a matrix 
by an integer multiple of 90° and to flip it left-right/up-down. 

»A1 , A3 = rot90(A1), A4 = rot90(A1,-2) 

A1 = -1 2 3 

4 5 2 

A3 = 3 2 %90° rotation 

2 5 

-1 4 

A4 = 2 5 4 %90°x(-2) rotation 

3 2-1 

»A5 = fliplr(AI) %flip left-right 
A5 = 3 2 -1 

2 5 4 

»A6 = flipud(AI) %flip up-down 
A6 = 4 5 2 

-12 3 

6. We can use the reshape () command to change the row-column size of a 
matrix with its elements preserved (columnwisely first). 

»A7 = reshape(Al ,3,2) 

A7 = -1 5 

4 3 

2 2 

»A8 = reshape(A1,6,1), A8 = A1 (:) %makes supercolumn vector 
A8 = -1 


3 


1.1.8 Random Number Generators 

MATLAB has the built-in functions, rand()/randn( ), to generate random 
numbers having uniform/normal (Gaussian) distributions, respectively ([K-l], 
Chapter 22). 


rand(M,N): generates an M x N matrix consisting of uniformly distributed 
random numbers 

randn(M,N): generates an M x N matrix consisting of normally distributed 
random numbers 
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1. Random Number Having Uniform Distribution 

The numbers in a matrix generated by the MATLAB function rand(M,N) have 
uniform probability distribution over the interval [0,1], as described by U(0,1). 
The random number x generated by nand () has the probability density function 

fx(x) = u s (x) — M v (x — 1) (m s O r) = | q Vx < 0 ' un ' 1 ste P f uncl ’ on ) 

(1.1.1) 

whose value is 1 over [0,1] and 0 elsewhere. The average of this standard uniform 
number x is 

m x = J xf x (x)dx = J xdx= y| (1.1.2) 

and its variance or deviation is 

, . f 1 1 1 1 J 1 1 

<*\ = J ( x ~ m x ) 2 f x (x)dx = J Q (x- j) dx = 3 (* “ 2> =12 

(1.1.3) 

If you want another random number y with uniform distribution U(a, b), trans¬ 
form the standard uniform number x as follows: 

y = (b - a)x + a (1.1.4) 

For practice, we make a vector consisting of 1000 standard uniform numbers, 
transform it to make a vector of numbers with uniform distribution U(—1, +1), 
and then draw the histograms showing the shape of the distribution for the two 
uniform number vectors (Fig. 1.7a,b). 

»u_noise = rand(1000,1) %a 1000x1 noise vector with U(0,1) 
»subplot(221), hist(u_noise,20) %histogram having 20 divisions 
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»u_noise1 = 2*u_noise-1 %a 1000x1 noise vector with U(-1,1) 

»subplot (222), hist(u_noise1,20) %histogram 

2. Random Number with Normal (Gaussian) Distribution 

The numbers in a matrix generated by the MATLAB function randn(M,N) have 
normal (Gaussian) distribution with average m = 0 and variance a 2 = 1, as 
described by A(0,1). The random number x generated by rand() has the prob¬ 
ability density function 

fx(x) = -4=e-* 2 / 2 (1.1.5) 

V27T 

If you want another Gaussian number y with a general normal distribution 
N(m, a 2 ), transform the standard Gaussian number x as follows: 


y = a 


+ m 


( 1 . 1 . 6 ) 


The probability density function of the new Gaussian number generated by this 
transformation is obtained by substituting x = (y — m)/o into Eq. (1.1.5) and 
dividing the result by the scale factor cr (which can be seen in dx = dy/o) 
so that the integral of the density function over the whole interval (—oo, +oo) 
amounts to 1. 


fv(y) = 


V2ncr 


(1.1.7) 


For practice, we make a vector consisting of 1000 standard Gaussian numbers, 
transform it to make a vector of numbers having normal distribution N( 1,1/4), 
with mean m = 1 and variance cr 2 = 1/4, and then draw the histograms for the 
two Gaussian number vectors (Fig. 1.7c,d). 


»g_noise = randn(1000,1) %a 1000x1 noise vector with N(0,1) 
»subplot(223), hist(g_noise,20) %histogram having 20 divisions 
>> g_noise1 = g_noise/2+1 %a 1000x1 noise vector with N(1,1/4) 
»subplot (224), hist(g_noise1,20) %histogram 


1.1.9 Flow Control 

1. if-end and switch-case-end Statements 

An if-end block basically consists of an if statement, a sequel part, and an end 
statement categorizing the block. An if statement, having a condition usually 
based on the relational/logical operator (Table 1.4), is used to control the program 
flow—that is, to adjust the order in which statements are executed according to 
whether or not the condition is met, mostly depending on unpredictable situa¬ 
tions. The sequel part consisting of one or more statements may contain else or 
elseif statements, possibly in a nested structure containing another if statement 
inside it. 

The switch-case-end block might replace a multiple if-elseif-..-end 
statement in a neat manner. 
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Table 1.4 Relational Operators and Logical Operators 


Relational 

operator 

Remark 

Relational 

operator 

Remark 

Logical 

operator 

Remark 

< 

less than 

> 

greater than 

& 

and 

<= 

less than or equal to 

>= 

greater than or equal to 


or 

== 

equal 

~= 

not equal(/) 

- 

not 


Let us see the following examples: 
Example 1. A Simple if-else-end Block 


%nm119_1: example of if-end block 
t = 0; 
if t > 0 
sgnt = 1; 
else 

sgnt = -1; 
end 


Example 2. A Simple if -elseif -end Block 


%nm119_2: example of if-elseif-end block 
if t > 0 
sgnt = 1 
elseif t < 0 
sgnt = -1 
end 


Example 3. An if-elseif-else-end Block 


%nm119_3: example of if-elseif-else-end block 
if t > 0, sgnt = 1 
elseif t<0, sgnt = -1 
else sgnt = 0 
end 


Example 4. An if-elseif-elseif -.. - else-end Block 


%nm119_4: example of if-elseif-elseif-else-end block 
point = 85; 

if point >= 90, grade = 'A' 

elseif point >= 80, grade = 'B' 
elseif point >= 70, grade = C 
elseif point >= 60, grade = 'D' 
else grade = 'F' 

end 








26 MATLAB USAGE AND COMPUTATIONAL ERRORS 


Example 5. A switch-case-end Block 


%nm119_5: example of switch-case-end block 
point = 85; 


switch floor(point/10) %floor(x): integer less than o 
case 9, grade = 'A' 

case 8, grade = 'B' 

case 7, grade = C 

case 6, grade = 'D' 

otherwise grade = 'F' 
end 

r equal to x 


2. for index = i_0:increment:i_last-end Loop 

A for loop makes a block of statements executed repeatedly for a specified 
number of times, with its loop index increasing from i_0 to a number not 
greater than i_last by a specified step (increment) or by 1 if not specified. 
The loop iteration normally ends when the loop index reaches i_last, but it 
can be stopped by a break statement inside the for loop. The for loop with a 
positive/negative increment will never be iterated if the last value (i_last) of 
the index is smaller/greater than the starting value (i_0). 

Example 6. A for Loop 


%nm119_6: example of for loop 
point = [76 85 91 65 87]; 
for n = 1 :length(point) 

if point(n) >=80, pf(n,:) = 'pass'; 

elseif point(n) >= 0, pf(n,:) = 'fail'; 
else %if point(n)< 0 
pf(n,:) = '????' ; 

fprintf('\n\a Something wrong with the data??\n'); 
break; 
end 
end 
Pf 


3. while Loop 

A while loop will be iterated as long as its predefined condition is satisfied and 
a break statement is not encountered inside the loop. 

Example 7. A while Loop 


%nm119_7: example of while loop 
r = 1; 

while r < 10 

r = input('\nType radius (or nonpositive number to stop):'); 
if r <= 0, break, end %isempty(r)| r <= 0, break, end 
v = 4/3*pi*r*r*r; 

fprintf('The volume of a sphere with radius %3.1f = %8.2f\n',r,v); 

end 
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Example 8. while Loops to Find the Minimum/Maximum Positive Numbers 
The following program “nmll9_8.m” contains three while loops. In the first 
one, x = 1 continues to be divided by 2 until just before reaching zero, and it 
will hopefully end up with the smallest positive number that can be represented 
in MATLAB. In the second one, x = 1 continues to be multiplied by 2 until just 
before reaching inf (the infinity defined in MATLAB), and seemingly it will get 
the largest positive number (x_maxO) that can be represented in MATLAB. But, 
while this number reaches or may exceed inf if multiplied by 2 once more, it still 
is not the largest number in MATLAB (slightly less than inf) that we want to 
find. How about multiplying x_maxO by (2 — 1/2")? In the third while loop, the 
temporary variable tmp starting with the initial value of 1 continues to be divided 
by 2 until just before x_maxO* (2-tmp) reaches inf, and apparently it will end 
up with the largest positive number (x_max) that can be represented in MATLAB. 


%nm119_8: example of while loops 
x = 1; kl - 0; 
while x/2 > 0 

x = x/2; kl = kl + 1; 
end 

kl, x_min = x; 

fprintf('x_min is %20.18e\n',x_min) 

x = 1; k2 = 0; 
while 2*x < inf 

x = x*2; k2 = k2+1; 
end 

k2, x_max0 = x; 

tmp = 1; k3 = 0; 
while x_max0*(2-tmp/2) < inf 
tmp = tmp/2; k3 = k3+1; 
end 

k3, x_max = x_max0*(2-tmp); 
fprintf('x_max is %20.18e\n',x_max) 

format long e 

x_min,-x_min,x_max,-x_max 

format hex 

format short 


1.2 COMPUTER ERRORS VERSUS HUMAN MISTAKES 

Digital systems like calculators and computers hardly make a mistake, since they 
follow the programmed order faithfully. Nonetheless, we often encounter some 
numerical errors in the computing results made by digital systems, mostly coming 
from representing the numbers in finite bits, which is an intrinsic limitation of dig¬ 
ital world. If you let the computer compute something without considering what 
is called the finite-word-length effect, you might come across a weird answer. In 
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that case, it is not the computer, but yourself as the user or the programmer, who 
is to blame for the wrong result. In this context, we should always be careful not 
to let the computer produce a farfetched output. In this section we will see how 
the computer represents and stores the numbers. Then we think about the cause 
and the propagation effect of computational error in order not to be deceived by 
unintentional mistakes of the computer and, it is hoped, to be able to take some 
measures against them. 

1.2.1 IEEE 64-bit Floating-Point Number Representation 

MATLAB uses the IEEE 64-bit floating-point number system to represent all 
numbers. It has a word structure consisting of the sign bit, the exponent field, 
and the mantissa field as follows: 

63 62 _52 51_0 

pS [ Exp onent [Mantissa j 


Each of these fields expresses S, E, and M of a number / in the way described 
below. 

• Sign bit 

,, _ b _ | 0 for positive numbers 
— 6 3 — | i f or ne g a ti ve numbers 

• Exponent field (b 62 b 6 |b(jo • ■ • b 52 ): adopting the excess 1023 code 

E = Exp - 1023 = {0, 1,..., 2 11 - 1 = 2047} - 1023 
= {-1023, -1022,..., +1023, +1024} 

-1023 + 1 for |/| < 2~ 1022 (Exp = 00000000000) 

= —1022 ~ +1023 for 2 -1022 < |/| < 2 1024 (normalized ranges) 

+ 1024 for ± oo 

• Mantissa field (bsibso ... bibo): 

In the un-normalized range where the numbers are so small that they can be 
represented only with the value of hidden bit 0, the number represented by the 
mantissa is 

M = 0.b 51 b 50 • ■ • bjbo = [b 5 ib 50 • • • bjbo] x 2~ 52 (1.2.1) 

You might think that the value of the hidden bit is added to the exponent, instead 
of to the mantissa. 

In the normalized range, the number represented by the mantissa together with 
the value of hidden bit bh = 1 is 

M = l.b 51 b 50 ■ ■ • bib 0 = 1 + [b 51 b 50 ■ ■ • bibo] x 2 -52 

= 1 + b 5 i x 2 _1 + b 50 x 2 -2 H-1- bi x 2~ 51 + b 0 x 2 -52 
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= {1,1 + 2 -52 , 1 + 2 x 2 -52 ,..., 1 + (2 52 - 1) x 2~ 52 } 

= {1,1 + 2 -52 , 1 + 2 x 2 -52 ,..., (2 - 2 -52 )} 

= {1, 1 + A, 1 + 2A. 1 + (2 52 — 1)A = 2 — A} (A = 2“ 52 ) (1.2.2) 

The set of numbers S, E, and M, each represented by the sign bit S, the 
exponent field Exp and the mantissa field M, represents a number as a whole 

/ = ±M • 2 E (1.2.3) 

We classify the range of numbers depending on the value ( E ) of the exponent 
and denote it as 

Re = [2 E , 2 e+1 ) with - 1022 < E < +1023 (1.2.4) 

In each range, the least unit—that is, the value of LSB (least significant bit) or 
the difference between two consecutive numbers represented by the mantissa of 
52 bits—is 

A e = A x 2 e = 2~ 52 x 2 e = 2 e ~ 52 (1.2.5) 

Let us take a closer look at the bitwise representation of numbers belonging 
to each range. 

0. 0(zero) 



1. Un-normalized Range (with the value of hidden bit b h = 0) 


fi -1023 = [2~ 1074 , 2 -1022 ) with Exp = 0, E = Exp - 1023 + 1 = -1022 



2. The Smallest Normalized Range (with the value of hidden bit b h = 1) 
R 1022 = [2- 1022 , 2 -1021 ) with Exp = l, E = Exp — 1023 = -1022 

| S; 000 . . . 0001 ; 0000 0000 .... 0000 00001 (1 + 0) x 2 E = (1 + 0) x 



Value of LSB: A_ 1022 = 2- 1022 " 52 = 2- 1074 
3. Basic Normalized Range (with the value of hidden bit b h = 1) 
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R 0 = [2°, 2 1 ) with Exp = 2 10 - 1 = 1023, E = Exp — 1023 = 0 

|S|011 ... 1111 ; 0000 0000 .... OOOOOOOOl (1 + 0) x 2 e = (1 + 0) x 2° = 1 
|S|011... 1111 I 0000 0000 .... OOOOOOOll (1 + 2- 52 ) x 2° 


|S| 011 . . . 1111 ! 1111 1111 .... 1111 1111| {(1 + (252 _ -i) 2-52) = (2 - 2-52)} x 2 0 


Value of LSB: A 0 = 2“ 52 

4. The Largest Normalized Range (with the value of hidden bit b h = 1) 
^1024 = [2 1023 , 2 1024 ) with Exp = 2 u -2 = 2046, E = Exp-1023 = 1023 


Isjiii ■ 

.. moloooooooo .. 

.. OOOOOOOOl 

Iglm ■ 

mo | oooo oooo 

0000 00011 


| SI 111 ... 

11 io; 

1111 1111 

... 111111111 {(1 +(252 

-1)2-52): 

= (2 - 2-52)} x 2 1023 

Value of 

LSB: 

A-1022 = 

. 2-1022-52 _ 2-1074 



±oo(inf) 

Exp = 

= 2 11 - 1 = 

= 2047, E = Exp — 

1023 = 

1024 (meaningless) 

«h 

.. in 

1 | oooo oooo 

.... oooo 0000| +°°*(1 

+ 0)x2 e = 

= (1 +0)x 21024 


.. in 

1 | oooo oooo 

. . . . OOOOOOOOl 

1 + 0) x 2 e 

= -(t + 0)x2 1 024 

^iafi. 

.. nr 

11 oooo oooo 

"7771 0000 00011 invalid 

(not used) 



| S1111 ... Ill |l1111in .... 1111 111l| invalid (not used) 


From what has been mentioned earlier, we know that the minimum and max¬ 
imum positive numbers are, respectively, 

/ m4n = (0 + 2 -52 ) x 2“ 1022 = 2“ 1074 = 4.9406564584124654 x I0“ 324 
/max = (2 - 2- 52 ) x 2 1023 = 1.7976931348623157 x 10 308 


This can be checked by running the program “nml 19_8.m” in Section 1.1.9. 

Now, in order to gain some idea about the arithmetic computational mecha¬ 
nism, let’s see how the addition of two numbers, 3 and 14, represented in the 
IEEE 64-bit floating number system, is performed. 



normalize 


1026,0 10.00100... 
1027,0 ED. oooio... 


...0 Binary-to-Decimal Conversion 

...0 = 1 .0001 2 x10 lo27 -'° 23 = 10001 2 = 1 x 2 4 + 1 x 2° = 17,0 
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In the process of adding the two numbers, an alignment is made so that the 
two exponents in their 64-bit representations equal each other; and it will kick 
out the part smaller by more than 52 bits, causing some numerical error. For 
example, adding 2~ 23 to 2 30 does not make any difference, while adding 2 -22 to 
2 30 does, as we can see by typing the following statements into the MATLAB 
Command window. 

»x = 2"30; x + 2"-22 == x, x + 2*-23 == x 
ans = O(false) ans = 1(true) 

(cf) Each range has a different minimum unit (LSB value) described by Eq. (1.2.5). It 
implies that the numbers are uniformly distributed within each range. The closer the 
range is to 0, the denser the numbers in the range are. Such a number representation 
makes the absolute quantization error large/small for large/small numbers, decreasing 
the possibility of large relative quantization error. 

1.2.2 Various Kinds of Computing Errors 

There are various kinds of errors that we encounter when using a computer for 
computation. 

• Truncation Error: Caused by adding up to a finite number of terms, while 
we should add infinitely many terms to get the exact answer in theory. 

• Round-off Error Caused by representing/storing numeric data in finite bits. 

• Overflow/Underflow: Caused by too large or too small numbers to be rep¬ 
resented/stored properly in finite bits—more specifically, the numbers hav¬ 
ing absolute values larger/smaller than the maximum (/ max )/minimum(/ min ) 
number that can be represented in MATLAB. 

• Negligible Addition: Caused by adding two numbers of magnitudes differing 
by over 52 bits, as can be seen in the last section. 

• Loss of Significance: Caused by a “bad subtraction,” which means a sub¬ 
traction of a number from another one that is almost equal in value. 

• Error Magnification: Caused and magnified/propagated by multiplying/divi¬ 
ding a number containing a small error by a large/small number. 

• Errors depending on the numerical algorithms, step size, and so on. 

Although we cannot be free from these kinds of inevitable errors in some degree, 
it is not computers, but instead human beings, who must be responsible for 
the computing errors. While our computer may insist on its innocence for an 
unintended lie, we programmers and users cannot escape from the responsibility 
of taking measures against the errors and would have to pay for being careless 
enough to be deceived by a machine. We should, therefore, try to decrease the 
magnitudes of errors and to minimize their impact on the final results. In order 
to do so, we must know the sources of computing errors and also grasp the 
computational properties of numerical algorithms. 
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For instance, consider the following two formulas: 


/i(x) = Vx(VxTT - yfc), flix) = + (1.2.6) 

These are theoretically equivalent, hence we expect them to give exactly the 
same value. However, running the MATLAB program “nml22.m” to compute 
the values of the two formulas, we see a surprising result that, as x increases, 
the step of fi(x) incoherently moves hither and thither, while fi{x) approaches 
1/2 at a steady pace. We might feel betrayed by the computer and have a doubt 
about its reliability. Why does such a flustering thing happen with f\ (x)l It is 
because the number of significant bits abruptly decreases when the subtraction 
(y/x + 1 — y/x) is performed for large values of x, which is called Toss of 
significance’. In order to take a close look at this phenomenon, let x = 10 15 . 
Then we have 


V^TT = 3.162277660168381 x 10 7 = 31622776.60168381 
Vx = 3.162277660168379 x 10 7 = 31622776.60168379 

These two numbers have 52 significant bits, or equivalently 16 significant digits 
(2 52 fa 10 52x3/I ° fa 10 15 ) so that their significant digits range from 10 8 to 10~ 8 . 
Accordingly, the least significant digit of their sum and difference is also the 
eighth digit after the decimal point (10 -8 ). 


+ Vx = 63245553.20336761 

VITT - y^= 0.00000001862645149230957 % 0.00000002 

Note that the number of significant digits of the difference decreased to 1 from 
16. Could you imagine that a single subtraction may kill most of the significant 
digits? This is the very Toss of significance’, which is often called ‘catastrophic 
cancellation’. 


%nm122 

clear 

fl = inlinef'sqrt(x)*(sqrt(x + 1) - sqrt(x))', 1 x'); 
f2 = inlinef 1 sqrt(x)./(sqrt(x + 1) + sqrt(x))', 1 x'); 
x = 1; 

format long e 
for k = 1:15 

fprintff'At x=%15.0f, fl(x)=%20.18f, f2(x) = %20.18f 1 , x,f1(x),f2(x)); 
x = 10*x; 
end 

sxl = sqrt(x+1); sx = sqrt(x); d = sxl - sx; s = sxl + sx; 
fprintf( 1 sqrt(x+1) =%25.13f, sqrt(x) =%25.13f ',sx1,sx); 
fprintf (' diff = %25.23f, sum = %25.23f \d,s); 
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» nml22 



At x= 100 
At x= 1000 
At x= 10000 
At x= 100000 
At x= 1000000 
At x= 10000000 
At x= 100000000 
At x= 1000000000 



At x= 100000000000000, 


fl(x)=0.414213562373095150, 
fl(X)=0.488088481701514750, 
fl(x)=0.498756211208899460, 
fl(x)=0.499875062461021870, 
fl (x)=0.499987500624854420, 
fl (x)=0.499998750005928860, 
fl(x)=0.499999875046341910, 
fl (x) =0.499999987401150920, 
fl(x)=0.500000005558831620, 
fl (x)=0.500000077997506340, 
fl (x)=0.499999441672116520, 
fl(x)=0.500004449631168080, 
fl (x)=0.500003807246685030, 
fl(x)=0.499194546973835970, 
fl(x)=0.502914190292358400, 


f2(X)=0.414213562373095090 
f2(X)=0.488088481701515480 
f2 (X) =0.498756211208902730 
f2 (x) =0.499875062460964860 
f2 (X) =0.499987500624960890 
f2 (x) =0.499998750006249940 
f2 (X) =0.499999875000062490 
f2 (x) =0.499999987500000580 
f2 (X) =0.499999998749999950 
f2 (x) =0.499999999874999990 
f2 (X) =0.499999999987500050 
f2 (x) =0.499999999998750000 
f2 (X) =0.499999999999874990 
f2 (x) =0.499999999999987510 
f2 (X) =0.499999999999998720 


sqrt(x+1) = 31622776.6016838100000, sqrt(x) = 31622776.6016837920000 
diff=0.00000001862645149230957, sum=63245553.20336760600000000000000 


1.2.3 Absolute/Relative Computing Errors 

The absolute/relative error of an approximate value x to the true value I of a 
real-valued variable is defined as follows: 


e x = X (true value) — x (approximate value) 


Px 


e x _ X - x 
~X ~ X 


(1.2.7) 

( 1 . 2 . 8 ) 


If the least significant digit (LSD) is the </th digit after the decimal point, then 
the magnitude of the absolute error is not greater than half the value of LSD. 


tel = \x- x \< ±io-< 


(1.2.9) 


If the number of significant digits is s, then the magnitude of the relative error 
is not greater than half the relative value of LSD over MSD (most significant 
digit). 


i i ^ 


\X-x\ 

m 




( 1 . 2 . 10 ) 


1.2.4 Error Propagation 

In this section we will see how the errors of two numbers, x and y, are propagated 
with the four arithmetic operations. Error propagation means that the errors in the 
input numbers of a process or an operation cause the errors in the output numbers. 

Let their absolute errors be e x and s y , respectively. Then the magnitudes of 
the absolute/relative errors in the sum and difference are 


£*±y = (X ± Y) — (x ± y) = (X — x) ± (Y — y) = s x ± e y 

tel,yl < tel + tel 

te±yl ^ \X\\e x /X\ + \Y\\s y /Y\ _ \X\\p x \ + iritel 
\Px± y \ | Z±F | - |x±y| \x±y\ 


( 1 . 2 . 11 ) 

( 1 . 2 . 12 ) 
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From this, we can see why the relative error is magnified to cause the “loss 
of significance” in the case of subtraction when the two numbers X and Y are 
almost equal so that \X — Y\ ss 0. 

The magnitudes of the absolute and relative errors in the multiplication/division 
are 


\e xy \ = |XY - xy\ = \XY - (X + s x )(Y + e v )| \Xs y ± Ye x \ 


M < \X\\e y \ + \Y\\s x \ 



X X 


X X + s x 

_ IZgy - Ye x \ 


Y~y 


Y ~ Y + s y 

Y 2 




l £ .v/vl 


(1.2.13) 

(1.2.14) 


(1.2.15) 

(1.2.16) 


This implies that, in the worst case, the relative error in multiplication/division 
may be as large as the sum of the relative errors of the two numbers. 


1.2.5 Tips for Avoiding Large Errors 

In this section we will look over several tips to reduce the chance of large errors 
occurring in calculations. 

First, in order to decrease the magnitude of round-off errors and to lower the 
possibility of overflow/underflow errors, make the intermediate result as close to 
1 as possible in consecutive multiplication/division processes. According to this 
rule, when computing xy/z, we program the formula as 

• {xy )/t when x and y in the multiplication are very different in magnitude, 

• x(y/z) when y and z in the division are close in magnitude, and 

• ( x/z)y when x and z in the division are close in magnitude. 

For instance, when computing y n /e nx with x >- 1 and y >- 1, we would program 
it as ( y/e x ) n rather than as y n le nx , so that overflow/underflow can be avoided. You 
may verify this by running the following MATLAB program “nml 25_1. m”. 


%nm125_1: 

x = 36; y = 1e16; 

for n = [-20 -19 19 20] 

fprintf ('y A %2d/e A %2dx = %25.15e\n 1 , n, n,y A n/exp(n*x)); 
fprintf ( 1 (y/e A x) A %2d = %25.15e\n 1 , n, (y/exp(x)) A n); 
end 
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»nm125_1 

y"-20/e"-20x = 0.000000000000000e+000 
(y/e"x)"-20 = 4.920700930263814e-008 
y"-19/e"-19x = 1.141367814854768e-007 
(y/e"x)"-19 = 1.141367814854769e-007 
y"19/e"19x = 8.761417546430845e+006 
(y/e"x)"19 = 8.761417546430843e+006 
y"20/e"20x = NaN 
(y/e"x)"20 = 2.032230802424294e+007 


Second, in order to prevent ‘loss of significance’, it is important to avoid a 
‘bad subtraction’ (Section 1.2.2)—that is, a subtraction of a number from another 
number having almost equal value. Let us consider a simple problem of finding 
the roots of a second-order equation ax 2 + bx + c = 0 by using the quadratic 
formula 


—b + \Jb 2 — 4ac —b — \Jb 2 — 4 ac 

xi = ---, *2 =- ■= - (1.2.17) 

2 a 2 a 

Let |4ac| -< b 2 . Then, depending on the sign of b, a “bad subtraction” may be 
encountered when we try to find x\ or X2, which is the smaller one of the two 
roots. This implies that it is safe from the “loss of significance” to compute the 
root having the larger absolute value first and then obtain the other root by using 
the relation (between the roots and the coefficients) X 1 X 2 = c/a. 

For another instance, we consider the following two formulas, which are ana¬ 
lytically the same, but numerically different: 

1 — cos x sin 2 x 

MX) = - 5 -, / 2 (x) = —j—— -- (1.2.18) 

X 2 x 2 (l + cosx) 

It is safe to use f] (x) for x ^ n since the term (1 + cos x) in f 2 (x) is a ‘bad sub¬ 
traction’, while it is safe to use f 2 (x) for x ^ 0 since the term (1 — cos x) in j\ (x) 
is a ‘bad subtraction’. Let’s run the following MATLAB program “nm125_2.m” 
to confirm this. Below is the running result. This implies that we might use some 
formulas to avoid a ‘bad subtraction’. 


%nm125_2: round-off error test 

fl = inline( 1 (1 - cos(x))/x/x','x'); 

f2 = inline( 1 sin(x)*sin(x)/x/x/(1 + cos(x)) 1 , 1 x 1 ); 

for k = 0:1 

x = k*pi; tmp = 1; 
for kl =1:8 

tmp = tmp*0.1; xi = x + tmp; 

fprintf('At x = %10.8f, 1 , xi) 

fprintf ('f 1 (x) = %18.12e; f2(x) = %18.12e\ f 1 (xi),f2(xi)); 
end 
end 
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» nml 25_2 

At x = 0.10000000, fl (x) 
At x = 0.01000000, fl(x) 
At x = 0.00100000, fl (x) 
At x = 0.00010000, fl(x) 
At x = 0.00001000, fl (x) 
At x = 0.00000100, fl(x) 
At x = 0.00000010, fl (x) 
At x = 0.00000001, fl(x) 
At x = 3.24159265, fl(x) 
At x = 3.15159265, fl (x) 
At X = 3.14259265, fl (x) 
At x = 3.14169265, fl (x) 
At x = 3.14160265, fl(x) 
At x = 3.14159365, fl (x) 
At X = 3.14159275, fl(x) 
At x = 3.14159266, fl (x) 


4.995834721974e-001; f2( 
4.999958333474e-001; f2( 
4.999999583255e-001; f2( 
4.999999969613e-001; f2( 
5.000000413702e-001; f2( 
5.000444502912e-001; f2( 
4.996003610813e-001; f2( 
0.000000000000e+000; f2 ( 
1 .898571371550e-001; f2( 
2.013534055392e-001; f2( 
2.025133720884e-001; f2( 
2.026294667803e-001; f2( 
2.026410772244e-001; f2( 
2.026422382785e-001 ; f2( 
2.026423543841e-001; f2( 
2.026423659946e-001 ; f 2 ( 


X) = 4.995834721974e-001 
x) = 4.999958333472e-001 
x) = 4.999999583333e- 001 
X) = 4.999999995833e - 001 
X) = 4.999999999958e - 001 
x) = 5. OOOOOOOOOOOOe - 001 
X) = 5.000000000000e-001 
X) = 5.000000000000e-001 
x) = 1.898571371550e- 001 
x) = 2.013534055391e-001 
X) = 2.025133720914e-001 
x) = 2.026294678432e- 001 
X) = 2 . 026410604538e - 001 
X) = 2 . 026242248740e - 001 
x) = 2.028044503269e- 001 
x) = Inf 


It may be helpful for avoiding a ‘bad subtraction’ to use the Taylor series 
expansion ([W-l]) rather than using the exponential function directly for the 
computation of e x . For example, suppose we want to find 


h(x) = 


(1.2.19) 


We can use the Taylor series expansion up to just the fourth-order of e x about x = 0 
g"(0) 2 , g (3) (0) 3 , g (4) (0) 4 


g( x ) = e x ^ g(0) + g'(0)* -1 

■ 1 2 1 , 1 £ 


to approximate the above function (1.2.19) a 
e x -1 ' 1 

Mx) = -' 


4! 


( 1 . 2 . 20 ) 


Noting that the true value of (1.2.9) is computed to be 1 by using the L’Hopital’s 
rule ([W-l]), we run the MATLAB program “nm125_3.m” to find which one of 
the two formulas / 3 (x) and / 4 (x) is better for finding the value of the expression 
(1.2.9) at x = 0. Would you compare them based on the running result shown 
below? How can the approximate formula / 4 (x) outrun the true one / 3 (x) for 
the numerical purpose, though not usual? It is because the zero factors in the 
numerator/denominator of / 3 (x) are canceled to set / 4 (x) free from the terror of 
a “bad subtraction.” 
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%nm125_3: reduce the round-off error using Taylor series 
f3 = inline('(exp(x)-1)/x','x'); 
f 4 = inline( 1 ((x/4+1 )*x/3) + x/2+r,'x l ); 
x = 0; tmp = 1; 
for kl = 1:12 

tmp = tmp*0.1; xl = x + tmp; 

fprintf('At x = %14.12f, 1 , xl) 

fprintf(’f3(x) =%18.12e; f4(x) =%l8.12e', f3(x1),f4(x1)); 


» nm125_3 
At x=0.100000000000, 
At x=0.010000000000, 
At x=0.001000000000, 
At x=0.000100000000, 
At x=0.000010000000, 
At x=0.000001000000, 
At x=0.000000100000, 
At x=0.000000010000, 
At x=0.000000001000, 
At x=0.000000000100, 
At x=0.000000000010, 
At x=0.000000000001, 


f3(x) 
f3(x) 
f 3 (x) 
f3(x) 
f 3 (x) 
f 3 (x) 
f 3 (x) 
f 3 (x) 
f 3 (x) 
f3(x) 
f3(x) 
f 3 (x) 


1.051709180756e+000 
1.005016708417e+000 
1.000500166708e+000 
1.000050001667e+000 
1.000005000007e+000 
1.000000499962e+000 
1.000000049434e+000 
9.999999939225e- 001 
1.000000082740e+000 
1.000000082740e+000 
1.000000082740e+000 
1.000088900582e+000 


f4(x) 

f4(x) 

f4(x) 

f4(x) 

f4(x) 

f4(x) 

f4(x) 

f4(x) 

f4(x) 

f4(x) 

f4(x) 

f4(x) 


1.084166666667e+000 
1.008341666667e+000 
1.000833416667e+000 
1.000083334167e+000 
1.000008333342e+000 
1.000000833333e+000 
1.000000083333e+000 
1.000000008333e+000 
1.000000000833e+000 
1.000000000083e+000 
1.000000000008e+000 
1.000000000001e+000 


1.3 TOWARD GOOD PROGRAM 


Among the various criteria about the quality of a general program, the most 
important one is how robust its performance is against the change of the problem 
properties and the initial values. A good program guides the program users who 
don’t know much about the program and at least give them a warning message 
without runtime error for their minor mistake. There are many other features 
that need to be considered, such as user friendliness, compactness and elegance, 
readability, and so on. But, as far as the numerical methods are concerned, the 
accuracy of solution, execution speed (time efficiency), and memory utilization 
(space efficiency) are of utmost concern. Since some tips to achieve the accuracy 
or at least to avoid large errors (including overflow/underflow) are given in the 
previous section, we will look over the issues of execution speed and memory 
utilization. 


1.3.1 Nested Computing for Computational Efficiency 

The execution speed of a program for a numerical solution depends mostly on 
the number of function (subroutine) calls and arithmetic operations performed in 
the program. Therefore, we like the algorithm requiring fewer function calls and 
arithmetic operations. For instance, suppose we want to evaluate the value of a 
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polynomial 

p 4 (x) = a\x A + a 2 x 3 + a 3 x 2 + a 4 x + as (1.3.1) 

It is better to use the nested structure (as below) than to use the above form as 
it is. 

p 4 „(x) = (((nix + a 2 )x + a 3 )x + a 4 )x + a 5 (1.3.2) 

Note that the numbers of multiplications needed in Eqs. (1.3.2) and (1.3.1) are 
4 and (4 + 3 + 2 + 1 = 9), respectively. This point is illustrated by the program 
“nm131_1 .m”, where a polynomial n* a i x ‘ °f degree N = 10 6 for a certain 
value of x is computed by using the three methods—that is, Eq. (1.3.1), Eq. 
(1.3.2), and the MATLAB built-in function ‘polyval() ’. Interested readers could 
run this program to see that Eq. (1.3.2)—that is, the nested multiplication—is 
the fastest, while ‘polyval()’ is the slowest because of some overhead time for 
being called, though it is also fabricated in a nested structure. 


%nm131_1: nested multiplication vs. plain multiple multiplication 
N = 1000000+1; a = [1 :N] ; x = 1; 
tic % initialize the timer 

p = sum(a.*x."[N-1:-1:0]); %plain multiplication 

p, toe % measure the time passed from the time of executing 'tic' 

tic, pn=a(1); 

for i = 2:N %nested multiplication 
pn = pn*x + a(i); 
end 

pn, toe 

tic, polyval(a,x), toe 


Programming in a nested structure is not only recommended for time-efficient 
computation, but also may be critical to the solution. For instance, consider a 
problem of finding the value 

K X k 

S(K) = J2—e~ x for A. = 100 and A" = 155 (1.3.3) 

i=o 


%nm131_2_1: nested structure 
lam = 100; K = 155; 
p = exp(-lam); 

S = 0; 
for k = 1:K 

p=p*lam/k; S=S+p; 
end 

S_ 


%nm131_2_2: not nested structure 
lam = 100; K = 155; 

S = 0; 
for k = 1 :K 

p = lanTk/factorialfk); 

S = S + p; 
end 

S*exp(-lam) _ 


The above two programs are made for this computational purpose. Noting that 
this sum of Poisson probability distribution is close to 1 for such a large K, we 
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can run them to find that one works fine, while the other gives a quite wrong 
result. Could you tell which one is better? 

1.3.2 Vector Operation Versus Loop Iteration 

It is time-efficient to use vector operations rather than loop iterations to perform a 
repetitive job for an array of data. The following program “nml 32_1. m” compares 
a vector operation versus a loop iteration in terms of the execution speed. Could 
you tell which one is faster? 


%nm132_1: vector operation vs. loop iteration 

N = 100000; th = [0:N-1]/50000*pi; 

tic 

ss=sin(th(1)); 

for i = 2:N, ss = ss + sin(th(i)); end % loop iteration 

toe, ss 

tic 

ss = sum(sin(th)); % vector operation 
toe, ss 


As a more practical example, let us consider a problem of finding the DtFT 
(discrete-time Fourier transform) ([W-3]) of a given sequence jc[n]. 


N -1 

JST(S2) = J2 x M e ~ jn " for Q = [—100 : 100]7r/100 (1.3.4) 


The following program “nm132_2.m” compares a vector operation versus a loop 
iteration for computing the DtFT in terms of the execution speed. Could you tell 
which one is faster? 


%nm132_2: nested structure 

N = 1000; x = rand(1, N ); % a random sequence x[n] for n = 0:N-1 

W = [-100:100]*pi/100; % frequency range 

tic 

for k = 1:length(W) 

XI(k) = 0; %for for loop 

for n = 1:N, XI(k) = XI(k) + x(n)*exp(-j*W(k)*(n-1)); end 
end 
toe 
tic 

X2 = 0; 

for n = 1:N %for vector loop 
X2 = X2 +x(n)*exp(-j*W*(n-1)); 
end 
toe 

discrepancy = norm(X1-X2) %transpose for dimension compatibility 
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1.3.3 Iterative Routine Versus Nested Routine 

In this section we compare an iterative routine and a nested routine performing the 
same job. Consider the following two programs fctrll (n)/fctrl2(n), whose 
common objectives is to get the factorial of a given nonnegative integer k. 

k\ = k(k- 1) - - 2 - 1 (1.3.5) 

They differ in their structure. While fctrll () uses a for loop structure, f ctrl2 () 
uses the nested (recursive) calling structure that a program uses itself as a subroutine 
to perform a sub-job. Compared with fctrll (), fctrl2 () is easier to program as 
well as to read, but is subject to runtime error that is caused by the excessive use 
of stack memory as the number of recursive calls increases with large n. Another 
disadvantage of fctrl2() is that it is time-inefficient for the number of function 
calls, which increases with the input argument (n). In this case, a professional 
programmer would consider the standpoint of users to determine the programming 
style. Some algorithms like the adaptive integration (Section 5.8), however, may 
fit the nested structure perfectly. 


function m = fctrll(n) 

function m = fctrl2(n) 

m = 1; 

if n <= 1, m = 1; 

for k = 2:n, m = m*k; end 

else m = n*fctrl2(n-1); 


end 


1.3.4 To Avoid Runtime Error 

A good program guides the program users who don’t know much about the 
program and at least gives them a warning message without runtime error for 
their minor mistake. If you don’t know what runtime error is, you can experience 
one by taking the following steps: 

1. Make and save the above routine fctrll () in an M-file named ‘fctrl.m’ 
in a directory listed in the MATLAB search path. 

2. Type fctrl( -1) into the MATLAB Command window. Then you will see 

»fctrl( -1) 
ans = 1 

This seems to imply that (—1)! = 1, which is not true. It is caused by the mistake 
of the user who tries to find (—1)! without knowing that it is not defined. This 
kind of runtime error seems to be minor because it does not halt the process. 
But it needs special attention because it may not be easy to detect. If you are a 
good programmer, you will insert some error handling statements in the program 
fctrl() as below. Then, when someone happens to execute fctrl(-1) in the 
Command window or through an M-file, the execution stops and he will see the 
error message in the Command window as 


??? Error using ==> fctrl 

The factorial of negative number ?? 
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function m = fctrl(n) 

if n < 0, error('The factorial of negative number ??'); 
else m = 1; for k = 2:n, m = m*k; end 
end 


This shows the error message (given as the input argument of the error() 
routine) together with the name of the routine in which the accidental “error” 
happens, which is helpful for the user to avoid the error. 

Most common runtime errors are caused by an “out of domain” index of array 
and the violation of matrix dimension compatibility, as illustrated in Section 1.1.7. 
For example, consider the gauss (A, B) routine in Section 2.2.2, whose job is to 
solve a system of linear equations Ax = b for x. To appreciate the role of the fifth 
line handling the dimension compatibility error in the routine, remove the line 
(by putting the comment mark % before the line in the M-file defining gauss ()) 
and type the following statements in the Command window: 

»A = rand(3,3); B = rand(2,1); x = gauss(A,B) 

?? Index exceeds matrix dimensions. 

Error in ==> C:\MATLAB6p5\nma\gauss.m 

On line 10 ==> AB = [A(1:NA,1:NA) B(1:NA,1:NB)]; 

Then MATLAB gives you an error message together with the suspicious state¬ 
ment line and the routine name. But it is hard to figure out what causes the 
runtime error, and you may get nervous lest the routine should have some bug. 
Now, restore the fifth line in the routine and type the same statements in the 
Command window: 

»x = gauss(A,B) 

?? Error using ==> gauss 
A and B must have compatible dimension 

This error message (provided by the programmer of the routine) helps you to 
realize that the source of the runtime error is the incompatible matrices/vectors A 
and B given as the input arguments to the gauss () routine. Very like this, a good 
program has a scenario for possible user mistakes and fires the error routine for 
each abnormal condition to show the user the corresponding error message. 

Many users often give more/fewer input arguments than supposed to be given 
to the MATLAB functions/routines and sometimes give wrong types/formats of 
data to them. To experience this type of error, let us try using the MATLAB 
function sincl (t,D) (Section 1.3.5) to plot the graph of a sine function 

sin c(t/D) = S i n ( nt /D) w .^ Q _ q 5 an( j t _ r_ 2 , 2 ] (1.3.6) 

7 xt/D L J 

With this purpose, type the following statements in the Command window. 
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Figure 1.8 The graphs of a sine function defined by sincl (). 


»D = 0.5; bl = -2; b2 = 2; t = bl + [0:200]/200*(b2 - bl); 

»plot(t,sincl (t,D)), axis([b1 b2 -0.4 1.2]) 

»hold on, plot(t,sincl (t), 'k: ') 

The two plotting commands coupled with sincl (t,D) and sincl (t) yield the 
two beautiful graphs, respectively, as depicted in Fig. 1.8a. It is important to 
note that sincl () doesn’t bother us and works fine without the second input 
argument D. We owe the second line in the function sincl () for the nice error¬ 
handling service: 

if nargin < 2, D = 1; end 

This line takes care of the case where the number of input arguments (nargin) is 
less than 2, by assuming that the second input argument is D = 1 by default. This 
programming technique is the key to making the MATLAB functions adaptive 
to different number/type of input arguments, which is very useful for breathing 
the user-convenience into the MATLAB functions. To appreciate its role, we 
remove the second line from the M-file defining sincl () and then type the same 
statement in the Command window, trying to use sincl () without the second 
input argument. 

»plot(t,sincl (t), ' k: ') 

??? Input argument 'D' is undefined. 

Error in ==> C:\MATLAB6p5\nma\sinc1.m 
On line 4 ==> x = sin(pi*t/D)./(pi*t/D); 

This time we get a serious (red) error message with no graphic result. It is implied 
that the MATLAB function without the appropriate error-handling parts no longer 
allows the user’s default or carelessness. 

Now, consider the third line in sincl (), which is another error-handling state¬ 
ment. 


t(find(t==0))=eps; 
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or, equivalently 

for i = 1:length(t), if t(i) == 0, t(i) = eps; end, end 

This statement changes every zero element in the t vector into eps (2.2204e- 
016). What is the real purpose of this statement? It is actually to remove the 
possibility of division-by-zero in the next statement, which is a mathematical 
expression having t in the denominator. 

x = sin(pi*t/D)./(pi*t/D); 

To appreciate the role of the third line in sincl ( ), we remove it from the M-file 
defining sincl (), and type the following statement in the Command window. 

»plot(t,sincl (t,D), ' r') 

Warning: Divide by zero. 

(Type "warning off MATLAB:divideByZero" to suppress this warning.) 
In C:\MATLAB6p5\nma\sinc1.m at line 4) 

This time we get just a warning (black) error message with a similar graphic 
result as depicted in Fig. 1.8b. Does it imply that the third line is dispensable? 
No, because the graph has a (weird) hole at t = 0, about which most engi¬ 
neers/mathematicians would feel uncomfortable. That’s why authors strongly 
recommend you not to omit such an error-handling part as the third line as 
well as the second line in the MATLAB function sincl (). 

(cf) What is the value of sincl (t,D) for t = 0 in this case? Aren’t you curious? If so, 
let’s go for it. 

»sinc1 (0,D), sin(pi*0/D) / (pi*0/D), 0/0 
ans = NaN (Not-a-Number: undetermined) 

Last, consider of the fourth line in sincl (), which is only one essential 
statement performing the main job. 

x = sin(pi*t/D)./(pi*t/D); 

What is the .(dot) before /(division operator) for? In reference to this, authors 
gave you a piece of advice that you had better put a .(dot) just before the 
arithmetic operators * (multiplication), /(division), and "(power) in the function 
definition so that the term-by-term (termwise) operation can be done any time 
(Section 1.1.6, (A5)). To appreciate the existence of the . (dot), we remove it from 
the M-file defining sincl (), and type the following statements in the Command 
window. 

»clf, plot(t,sincl (t,D)), sinc1(t,D), sin(pi*t/D) / (pi*t/D) 
ans = -0.0187 
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What do you see in the graphic window on the screen? Surprise, a (horizontal) 
straight line running parallel with the t-axis far from any sine function graph! 
What is more surprising, the value of sincl (t,D) or sin(pi*t/D) / (pi*t/D) 
shows up as a scalar. Authors hope that this accident will help you realize how 
important it is for right term-by-term operations to put .(dot) before the arithmetic 
operators *, / and A . By the way, aren’t you curious about how MATLAB deals 
with a vector division without .(dot)? If so, let’s try with the following statements: 

»A = [1:10]; B = 2*A; A/B, A*B 1 *(B*B 1 ) A -1, A*pinv(B) 
ans = 0.5 

To understand this response of MATLAB, you can see Section 1.1.7 or Sec¬ 
tion 2.1.2. 

In this section we looked at several sources of runtime error, hoping that it 
aroused the reader’s attention to the danger of runtime error. 


1.3.5 Parameter Sharing via Global Variables 

When we discuss the runtime error that may be caused by user’s default in passing 
some parameter as input argument to the corresponding function, you might feel 
that the parameter passing job is troublesome. Okay, it is understandable as a 
beginner in MATLAB. How about declaring the parameters as global so that 
they can be accessed/shared from anywhere in the MATLAB world as far as the 
declaration is valid? If you want to, you can declare any varable(s) by inserting 
the following statement in both the main program and all the functions using 
the variables. 

global Gravity_Constant Dielectric_Constant 


%plot_sinc 
clear, elf 
global D 

D = 1; bl = -2; b2 = 2; 
t = bl +[0:100]/100*(b2 - bl); 

%passing the parameter(s) through arguments of the function 
subplot(221), plot(t, sincl(t,D)) 
axis([bl b2 -0.4 1.2]) 

%passing the parameter(s) through global variables 


subplot(222), plot(t, sinc2(t)) 
axis([bl b2 -0.4 1.2]) 

function x = sincl(t,D) 

function x = sinc2(t) 

if nargin<2, D = 1; end 

global D 

t(find(t == 0)) = eps; 

t(find(t == 0)) = eps; 

x = sin(pi*t/D)./(pi*t/D); 

x = sin(pi*t/D)./(pi*t/D); 


Then, how convenient it would be, since you don’t have to bother about pass¬ 
ing the parameters. But, as you get proficient in programming and handle many 
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functions/routines that are involved with various sets of parameters, you might 
find that the global variable is not always convenient, because of the follow¬ 
ing reasons. 

• Once a variable is declared as global, its value can be changed in any of the 
MATLAB functions having declared it as global, without being noitced by 
other related functions. Therefore it is usual to declare only the constants as 
global and use long names (with all capital letters) as their names for easy 
identification. 

• If some variables are declared as global and modified by several func¬ 
tions/routines, it is not easy to see the relationship and the interaction among 
the related functions in terms of the global variable. In other words, the pro¬ 
gram readability gets worse as the number of global variables and related 
functions increases. 

For example, let us look over the above program “plot_sinc.m” and the func¬ 
tion “sinc2()”. They both have a declaration of D as global; consequently, 
sinc2() does not need the second input argument for getting the parameter 
D. If you run the program, you will see that the two plotting statements adopting 
sincl () and sinc2( ) produce the same graphic result as depicted in Fig. 1.8a. 


1.3.6 Parameter Passing Through Varargin 

In this section we see two kinds of routines that get a function name (string) 
with its parameters as its input argument and play with the function. 

First, let us look over the routine “ez plotl ()”, which gets a function name 
(ftn) with its parameters (p) and the lower/upper bounds (bounds = [bl b2]) 
as its first, third, and second input argument, respectively, and plots the graph of 
the given function over the interval set by the bounds. Since the given function 
may or may not have its parameter, the two cases are determined and processed 
by the number of input arguments (nargin) in the if-else-end block. 


%plot_sinc1 
clear, elf 


D = 1; bl = -2; b2 = 2; 
t = b1+[0:100]/100*(b2 - bl); 
bounds = [bl b2]; 

subplot(223), ez plotl( 1 sincl 1 ,bounds,D) 
axis([bl b2 -0.4 1.2]) 
subplot(224), ez_plot('sincl',bounds,D) 
axisQbl b2 -0.4 1.2]) _ 


function ezplotl(ftn,bounds,p) 
if nargin < 2, bounds = [-1 1]; end 
bl = bounds(l); b2 = bounds(2); 
t = b1+[0:100]/100*(b2 - bl); 
if nargin <= 2, x = feval(ftn,t); 
else x = feval(ftn,t,p); 


function 

ez_plot(ftn,bounds,varargin) 
if nargin < 2, bounds = [-1 1]; end 
bl = bounds(l); b2 = bounds(2); 
t = bl + [0:100]/100*(b2 - bl); 
x = feval(ftn,t,varargin{:}); 
plot(t,x) 


plot(t,x) 
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Now, let us see the routine “ez_plot( )”, which does the same plotting job 
as “ez_plot1 ()”. Note that it has a MATLAB keyword varargin (variable 
length argument list) as its last input argument and passes it into the MATLAB 
built-in function feval() as its last input argument. Since varargin can repre¬ 
sent comma-separated multiple parameters including expression/strings, it paves 
the highway for passing the parameters in relays. As the number of parame¬ 
ters increases, it becomes much more convenient to use varargin for passing 
the parameters than to deal with the parameters one-by-one as in ez plotl (). 
This technique will be widely used later in Chapter 4 (on nonlinear equations), 
Chapter 5 (on numerical integration), Chapter 6 (on ordinary differential equa¬ 
tions), and Chapter 7 (on optimization). 

(cf) Note that MATLAB has a built-in graphic function ezplot(), which is much more 
powerful and convenient to use than ez_plot (). You can type ‘help ezplot’ to see 
its function and usage. 

1.3.7 Adaptive Input Argument List 

A MATLAB function/routine is said to be “adaptive” to users in terms of input 
arguments if it accepts different number/type of input arguments and makes a 
reasonable interpretation. For example, let us see the nonlinear equation solver 
routine ‘newton()’ in Section 4.4. Its input argument list is 

(f ,df,x0,tol,kmax) 

where f, df, xO, tol and kmax denote the filename (string) of function (to 
be solved), the filename (string) of its derivative function, the initial guess (for 
solution), the error tolerance and the maximum number of iterations, respectively. 
Suppose the user, not knowing the derivative, tries to use the routine with just 
four input arguments as follows. 

»newton(f,xO,tol,kmax) 

At first, these four input arguments will be accepted as f,df,xO, and tol, 
respectively. But, when the second line of the program body is executed, the 
routine will notice something wrong from that df is not any filename but a 
number and then interprets the input arguments as f ,xO,tol, and kmax to the 
idea of the user. This allows the user to use the routine in two ways, depending 
on whether he is going to supply the routine with the derivative function or not. 
This scheme is conceptually quite similar to function overloading of C++, but 
C++ requires us to have several functions having the same name, with different 
argument list. 

PROBLEMS 

1.1 Creating a Data File and Retrieving/Plotting Data Saved in a Data File 
(a) Using the MATLAB editor, make a program “nml pOI a”, which lets its 
user input data pairs of heights [ft] and weights [lb] of as many persons 
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as he wants until he presses <Enter> and save the whole data in the 
form of an iV x 2 matrix into an ASCII data file (***.dat) named by 
the user. If you have no idea how to compose such a program, you 
can permutate the statements in the box below to make your program. 
Store the program in the file named “nmlpOla.m” and ran it to save 
the following data into the data file named “hw.dat”: 

5.5162 

6.1185 

5.7170 

6.5195 

6.2191 


%nm1p01a: input data pairs and save them into an ASCII data file 



h = input('Enter height: 1 ) 
x(k,2) = input)'Enter weight: 1 ) 
if isempty(h), break; end 

cd('c:\matlab6p5\work 1 ) %change current working directory 
filename = input('Enter filename!.dat):','s'); 
filename = [filename '.dat']; %string concatenation 
save(filename,'x','/ascii') 


(b) Make a MATLAB program “nmlpOlb”, which reads (loads) the data 
file “hw.dat” made in (a), plots the data as in Fig. 1.1a in the upper- 
left region of the screen divided into four regions like Fig. 1.3, and 
plots the data in the form of piecewise-linear (PWL) graph describing 
the relationship between the height and the weight in the upper-right 
region of the screen. Let each data pair be denoted by the symbol ‘+’ 
on the graph. Also let the ranges of height and weight be [5, 7] and 
[160, 200], respectively. If you have no idea, you can permutate the 
statements in the below box. Additionally, run the program to check if 
it works fine. 


%nmlp0lb: to read the data file and plot the data 

cd('c:\matlab6p5\work') %change current working directory 

weight = hw(I,2); 

load hw.dat 

elf, subplot(221) 

plot(hw) 

Subplot(222) 

axis([5 7 160 200]) 

plot(height,weight,height,weight,'+') 

[height,!] = sort(hw(:,1)); 
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1.2 Text Printout of Alphanumeric Data 

Make a routine max_array (A), which uses the max( ) command to find one 
of the maximum elements of a matrix A given as its input argument and 
uses the fprintf () command to print it onto the screen together with its 
row/column indices in the following format. 

1 \n Max(A) is A(%2d,%2d) = %5.2f\n 1 ,row_index,col_index,maxA 

Additionally, try it to have the maximum element of an arbitrary matrix 
(generated by the following two consecutive commands) printed in this 
format onto the screen. 

»rand('state',sum(l00*clock)), rand(3) 

1.3 Plotting the Mesh Graph of a Two-Dimensional Function 

Consider the MATLAB program “nm1p03a”, whose objective is to draw 
a cone. 

(a) The statement on the sixth line seems to be dispensable. Run the pro¬ 
gram with and without this line and see what happens. 

(b) If you want to plot the function fcone(x,y) defined in another M-file 
‘fcone.m’, how will you modify this program? 

(c) If you replace the fifth line by ‘Z = 1 -abs(X) -abs(Y); what differ¬ 
ence does it make? 


%nm1p03a: to plot a cone 
clear, elf 

x = -1:0.02:1; y = -1:0.02:1; 
[X,Y] = meshgrid(x,y); 

Z = 1-sqrt(X. A 2+Y. A 2); 

Z = max(Z,zeros(size(Z))); 

mesh(X,Y,Z) _ 

function z = fcone(x,y) 
z = 1-sqrt(x. A 2 + y. A 2); 


1.4 Plotting The Mesh Graph of Stratigraphic Structure 

Consider the incomplete MATLAB program “nm1p04”, whose objective is 
to draw a stratigraphic structure of the area around Pennsylvania State 
University from the several perspective point of view. The data about 
the depth of the rock layer at 5 x 5 sites are listed in Table P1.4. Sup¬ 
plement the incomplete parts of the program so that it serves the pur¬ 
pose and run the program to answer the following questions. If you com¬ 
plete it properly and run it, MATLAB will show you the four similar 
graphs at the four comers of the screen and be waiting for you to press 
any key. 
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(a) At what value of k does MATLAB show you the mesh/surface-type graphs 
that are the most similar to the first graphs? From this result, what do you 
guess are the default values of the azimuth or horizontal rotation angle and 
the vertical elevation angle (in degrees) of the perspective view point? 

(b) As the first input argument Az of the command view(Az, El) decreases, 
in which direction does the perspective viewpoint revolve round the 
z-axis, clockwise or counterclockwise (seen from the above)? 

(c) AsthesecondinputargumentElofthecommandview(Az,E1) increases, 
does the perspective viewpoint move up or down along the z-axis? 

(d) What is the difference between the plotting commands mesh() and 
meshc()? 

(e) What is the difference between the usages of the command view() 
with two input arguments Az,El and with a three-dimensional vector 
argument [x,y,z]? 

Table PI .4 The Depth of the Rock Layer 


x Coordinate 

y Coordinate 0.1 1.2 2.5 3.6 4.8 


0.5 

1.4 
2.2 

3.5 

4.6 


410 

395 

365 

370 

385 


390 

375 

405 

400 

395 


380 

410 

430 

420 

410 


420 

435 

455 

445 

395 


450 

455 

470 

435 

410 


%nm1p04: to plot a stratigraphic structure 
clear, elf 

x = [ 0.1 .]; 

V = [0.5.]; 

Z = [410 390 .]; 

[X,Y] = meshgrid(x,y); 
subplot(221), mesh(X,Y,500 - Z) 
subplot(222), surf(X,Y,500 - Z) 
subplot(223), meshc(X,Y,500 - Z) 
subplot(224), meshz(X,Y,500 - Z) 
pause 

for k = 0:7 

Az = -12.5*k; El = 10*k; Azr = Az*pi/180; Elr = El*pi/180; 
subplot(221), view(Az,El) 

Subplot(222), 

k, view([sin(Azr),-cos(Azr),tan(Elr)]), pause %pause(1) 
end 


1.5 Plotting a Function over an Interval Containing Its Singular Point Noting 
that the tangent function f(x ) = tan(x) is singular at x = it 12, 3n/2, let us 
plot its graph over [0, 2jr] as follows. 





50 MATLAB USAGE AND COMPUTATIONAL ERRORS 


(a) Define the domain vector x consisting of sufficiently many intermediate 
point xi s along the a;-axis and the corresponding vector y consisting 
of the function values at x, ’s and plot the vector y over the vector x. 
You may use the following statements. 

»x = [0:0.01:2*pi]; y = tan(x); 

»subplot(221), plot(x,y) 

Which one is the most similar to what you have got, among the graphs 
depicted in Fig. P1.5? Is it far from your expectation? 

(b) Expecting to get the better graph, we scale it up along the y-axis by 
using the following command. 

»axis([0 6.3 -10 10]) 

Which one is the most similar to what you have got, among the graphs 
depicted in Fig. PI.5? Is it closer to your expectation than what you 
got in (a)? 

(c) Most probably, you must be nervous about the straight lines at the 
singular points x = n/2 and x = 3n/2. The more disturbed you become 
by the lines that must not be there, the better you are at the numerical 
stuffs. As an alternative to avoid such a singular happening, you can 
try dividing the interval into three sections excluding the two singular 
points as follows. 
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»x1 = [0:0.01 :pi/2-0.01 ]; x2 = [pi/2+0.01:0.01 :3*pi/2-0.01 ] ; 

»x3 = [3*pi/2+0.01:0.01 :2*pi]; 

»y1 = tan(xl); y2 = tan(x2); y3 = tan(x3); 

>>SubplOt(222) , plotfxl,y1,x2,y2,x3,y3), axis([0 6.3 -10 10]) 

(d) Try adjusting the number of intermediate points within the plotting 
interval as follows. 

»x1 = [0:200] *pi/100; yl = tan(xl); 

»x2 = [0:400]*pi/200; y2 = tan(x2); 

>>subplot(223), plot(x1,y1), axis([0 6.3 -10 10]) 

»SubplOt (224), plot (x2,y2), axis([0 6.3 -10 10]) 

From the difference between the two graphs you got, you might have 
guessed that it would be helpful to increase the number of intermediate 
points. Do you still have the same idea even after you adjust the range 
of the y-axis to [—50, +50] by using the following command? 

»axis( [0 6.3 -50 50]) 

(e) How about trying the easy plotting command ezplot () ? Does it answer 
your desire? 

»ezplot('tan(x) 1 ,0,2*pi) 

1.6 Plotting the Graph of a Sine Function 
The sine function is defined as 


fix) = - 


whose value at x = 0 ii 


/( 0 ) = lim-: 


(sin*)' I 


(PI -6.1) 


(PI.6.2) 


We are going to plot the graph of this function over [—4n, +471J. 
(a) Casually, you may try as follows. 


»x = [-100:100]*pi/25; y = sin(x)./x; 

»plOt(x,y), axis([-15 15 -0.4 1.2]) 

In spite of the warning message about ‘division-by-zero’, you may 
somehow get a graph. But, is there anything odd about the graph? 
(b) How about trying with a different domain vector? 


»x = [-4*pi:0.1:+4*pi]; y = sin(x)./x; 
»plOt(x,y), axis([-15 15 -0.4 1.2]) 
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Surprisingly, MATLAB gives us the function values without any com¬ 
plaint and presents a nice graph of the sine function. What is the 
difference between (a) and (b)? 

(cf) Actually, we would have no problem if we used the MATLAB built-in function 
sincQ. 


1.7 Termwise (Element-by-Element) Operation in In-Line Functions 

(a) Let the function f\(x) be defined without one or both of the dot(.) 
operators in Section 1.1.6. Could we still get the output vector consist¬ 
ing of the function values for the several values in the input vector? 
You can type the following statements into the MATLAB command 
window and see the results. 

»f1 = inline('1■/(1+8*x A 2)','x'); f1([0 1]) 

»f1 = inline!' 1 / (1+8*x. A 2)' x'); f 1 ([0 1]) 

(b) Let the function f\ (A) be defined with both of the dot(.) operators as in 
Section 1.1.6. What would we get by typing the following statements 
into the MATLAB command window? 

»f1 = inline!'1 -/(1+8*x. A 2) 1 , 1 x 1 ); f1([0 1]') 

1.8 In-Line Function and M-file Function with the Integral Routine ‘quad () ’ 

As will be seen in Section 5.8, one of the MATLAB built-in functions for 

computing the integral is ‘quad()’, the usual usage of which is 


quad(f ,a,b,tol,trace,p1 ,p2,..) 


f(x,pl,p2,...)dx 

(FI.8.1) 


where 


f is the name of the integrand function (M-file name should be categorized 
by 1 ') 

a, b are the lower/upper bound of the integration interval 

tol is the error tolerance (10 -6 by default [ ]) 

trace set to l(on)/0(off) (0 by default [ ]) for subintervals 

pi, p2,.. are additional parameters to be passed directly to function / 


Let’s use this quad () routine with an in-line function and an M-file function 
to obtain 


(x - x 0 ) f(x)dx 
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where 

x 0 = 1, f(x) = _^ e -(*-"») 2 /2<r 2 with m= i cr = 2 (PI.8.3) 

V2ncr 

Below are an incomplete main program ‘nm1p08’ and an M-file function 
defining the integrand of (PI.8.2a). Make another M-file defining the inte¬ 
grand of (PI.8.2b) and complete the main program to compute the two 
integrals (PI.8.2a) and (PI.8.2b) by using the in-line/M-file functions. 


function xfx = xGaussian_pdf(x,m,sigma,xO) 

xfx = (x - xO).*exp(-(x - m). "2/2/sigma"2)/sqrt(2*pi)/sigma; 


%nm1p08: to try using quad() with in-line/M-file functions 
clear 

m = 1; sigma = 2; 

int_xGausspdf = quad( 1 xGaussian_pdf 1 ,m - 10,m + 10,[],0,m,sigma,1) 
Gpdf = 'exp(-(x-m).~2/2/sigma~2)/sqrt(2*pi)/sigma 1 ; 
xGpdf = inline(['(x - xO).* 1 Gpdf], 1 x 1 , 1 m 1 , 1 sigma', 1 xO'); 
int_xGpdf = quad(xGpdf,m - 10,m+10,[],0,m,sigma,1) 


1.9 /r-Law Function Defined in an M-File 

The so-called //-law function and /x _1 -law function used for non-uniform 
quantization is defined as 


y = 8n(x) = |.y|„, 
x = g-\y) = \xU 


ln(l + /i|x|/[x| max ) . 

-sign 

ln(l + n) 

(l + /i)iyi/iyi— - i 


(PI.9a) 
(PI.9b) 


Below are the /x-law function mulaw() defined in an M-file and a main 
program nm1p09, which performs the following jobs: 

• Finds the values y of the /x-law function for x = [-1:0.01:1], plots the 
graph of y versus x. 

• Finds the values xO of the /x -1 -law function for y. 

• Computes the discrepancy between x and xO. 

Complete the /x -1 -law function mulaw_inv() and store it together with 
mulaw() and nm1p09 in the M-files named “mulaw_inv.m”, “mulaw.m”, 
and “nmlp09.m”, respectively. Then run the main program nml p09 to plot 
the graphs of the /x-law function with /x = 10, 50 and 255 and find the 
discrepancy between x and xO. 
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function [y,xmax] = mulaw(x,mu,ymax) 
xmax = max(abs(x)); 

y = ymax*log(1+mu*abs(x/xmax))./log(1+mu 

).*sign(x) 

% Eg.(PI.9a) 

function x = mulaw_inv(y,mu,xmax) 




%nm1p09: to plot the mulaw curve 

clear, elf 

x = [ -1:.005:1]; 

mu = [10 50 255]; 

for i = 1:3 

[y.xmax] = mulaw(x,mu(i),1); 
plot(x,y, 1 b- 1 , x,x0, 1 r- 1 ), hold on 
xO = mulaw_inv(y,mu(i),xmax); 
discrepancy = norm(x-xO) 


1.10 Analog-to-Digital Converter (ADC) 

Below are two ADC routines adc 1 (a, b , c) and adc2 (a, b , c), which assign 
the corresponding digital value c (i) to each one of the analog data belong¬ 
ing to the quantization interval [b ( i ), b (i+1)]. Let the boundary vector 
and the centroid vector be, respectively, 

b = [-3 -2 -1 0 1 2 3]; c = [-2.5 -1.5 -0.5 0.5 1.5 2.5]; 

(a) Make a program that uses two ADC routines to find the output d for 
the analog input data a = [ - 300:300 ] /100 and plots d versus a to see 
the input-output relationship of the ADC, which is supposed to be like 
Fig. PI. 10a. 


function d = adcl(a,b,c) 

%Analog-to-Digital Converter 

%Input a = analog signal, b(1:N + 1) = boundary vector 
c(1:N)=centroid vector 
%0utput: d = digital samples 
N = length(c); 
for n = 1:length(a) 

I = find(a(n) < b(2:N)); 
if -isempty(I), d(n) = c(I(1)); 
else d(n) = c(N); 

function d=adc2(a,b,c) 

N = length(c); 
d(find(a < b(2))) = c(1); 
for i = 2:N-1 

index = find(b(i) <= a & a <= b(i+1)); d(index) = c(i); 
end 

d(find(b(N) <= a)) = c(N); _ 
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output 



_ 4 I_1_1_■ input 1 

-4 -2 0 2 4 

(a) The input-output relationship of an ADC 
Figure PI.10 The characteristic of; 



0 2 4 6 8 

(b) The output of an ADC to a sinusoidal input 
an ADC (analog-to-digital converter). 


(b) Make a program that uses two ADC routines to find the output d for 
the analog input data a = 3*sin(t) with t = [0:200]/100*pi and 
plots a and d versus t to see how the analog input is converted into the 
digital output by the ADC. The graphic result is supposed to be like 
Fig. PI. 10b. 

1.11 Playing with Polynomials 

(a) Polynomial Evaluation: polyval() 

Write a MATLAB statement to compute 

p(x) = x*-l for x = 1 (PI.11.1) 

(b) Polynomial Addition/Subtraction by Using Compatible Vector Addi¬ 
tion/Subtraction 

Write a MATLAB statement to add the following two polynomials: 

pi(x) = x 4 + 1, p 2 (x) = x 3 - 2x 2 + 1 (PI.11.2) 

(c) Polynomial Multiplication: conv() 

Write a MATLAB statement to get the following product of polynomials: 

P(x) = (* 4 + 1)(* 2 + 1)(X + l)(x - 1) (Pl.l 1.3) 

(d) Polynomial Division: deconv() 

Write a MATLAB statement to get the quotient and the remainder of 
the following polynomial division: 

p(x)=x 8 /(x 2 - 1) (Pl.l 1.4) 

(e) Routine for Differentiation/Integration of a Polynomial 

What you see in the below box is the routine “poly_der(p)”, which 
gets a polynomial coefficient vector p (in the descending order) and 
outputs the coefficient vector pd of its derivative polynomial. Likewise, 
you can make a routine “poly_int(p)”, which outputs the coefficient 
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vector of the integral polynomial for a given polynomial coefficient 
vector. 

(cf) MATLAB has the built-in routines polyder( )/polyint () for finding the 
derivative/integral of a polynomial. 


function pd = poly_der(p) 

%p: the vector of polynomial coefficients in descending order 
N = length(p); 

if N <= 1, pd = 0; % constant 
for i = 1: N - 1, pd(i) = p(i)*(N - i); end 


(f) Roots of A Polynomial Equation: roots () 

Write a MATLAB statement to get the roots of the following polynomial 
equation 

p(x)=x 8 -l=0 (PI.11.5) 

You can check if the result is right, by using the MATLAB command 
poly(), which generates a polynomial having a given set of roots. 

(g) Partial Lraction Expansion of a Ratio of Two Polynomials: residue ()/ 
residuez() 

(i) The MATLAB routine [ r, p, k] = residue(B,A) finds the partial 
fraction expansion for a ratio of given polynomials B(s)/A(s ) as 

B(s) _ bis M - { + b 2 s M ~ 2 - \-b M r(i) 

A(s) ais ^ 1 + a 2 s N ~ 2 -f a N S s ~ P(0 

(PI.11.6a) 

which is good for taking the inverse Laplace transform. Use this 
routine to find the partial fraction expansion for 
_ As + 2 

^ S 3 + 6 S 2 + 115 + 6 5+ "*"5+ "*"5 + 

(PI.11.7a) 

(ii) The MATLAB routine [r,p,k] = residuez(B,A) finds the par¬ 
tial fraction expansion for a ratio of given polynomials B(z)/A(z ) 
as 

B(z) _ b\ + b 2 z 1 + • ■ ■ + bMZ (M _ _u r(i)z 

Afe) “ ai + A2Z" 1 + ■ ■ ■ + a N z-W-" ~ U ’ i Z ~ p{i) 

' (Pl.l 1.6b) 

which is good for taking the inverse z-transform. Use this routine 
to find the partial fraction expansion for 

x( m = 4 + 2z ~ 1 _ 1 | | 1 z 

m l + 6z~ x + Uz~ 2 + 6 Z - 3 z+ z+ z + 

(Pl.l 1.7b) 
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(h) Piecewise Polynomial: mkpp( )/ppval() 

Suppose we have an M x N matrix P, the rows of which denote 
M (piecewise) polynomials of degree (N - 1) for different (non¬ 
overlapping) intervals with (M + 1) boundary points bb = [ b (1) 
.. b(M + 1)], where the polynomial coefficients in each row are 
supposed to be generated with the interval starting from x =0. Then 
we can use the MATLAB command pp = mkpp(bb,P) to construct a 
structure of piecewise polynomials, which can be evaluated by using 
ppval(pp). 

Figure PI. 11(h) shows a set of piecewise polynomials {pi(x + 3), 
p 2 (x + 1), piix - 2)} for the intervals [-3, —1],[—1, 2] and [2, 4], 
respectively, where 

Pi(x) = x 2 , p 2 (x) = -(x - l) 2 , and p 3 (x) = x 2 - 2 (Pl.11.8) 

Make a MATLAB program which uses mkpp( )/ppval( ) to plot this 
graph. 



Figure PI.11(h) The graph of piecewise polynomial functions. 


(cf) You can type ‘help mkpp’ to see a couple of examples showing the usage 
of mkpp. 

1.12 Routine for Matrix Multiplication 

Assuming that MATLAB cannot perform direct multiplication on vectors/ 
matrices, supplement the following incomplete routine “multiplyjnatrix 
(A, B)” so that it can multiply two matrices given as its input arguments only 
if their dimensions are compatible, but displays an error message if their 
dimensions are not compatible. Try it to get the product of two arbitrary 
3x3 matrices generated by the command rand(3) and compare the result 
with that obtained by using the direct multiplicative operator *. Note that 
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the matrix multiplication can be described as 

K 

C(m, n) = ^ A(m, k)B(k, n) (PI.12.1) 


function C = multiply_matrix(A,B) 

[M,K] = size(A); [K1,N] = size(B); 

if K1 -= K 

error('The # of columns of A is not equal to the # of rows of B 1 ) 
else 

for m = 1: 
for n = 1: 

C(m,n) = A(m,1)*B(1,n); 
for k = 2: 

C(m,n) = C(m,n) + A(m,k)*B(k,n); 
end 
end 
end 


1.13 Function for Finding Vector Norm 

Assuming that MATLAB does not have the norm () command finding us the 
norm of a given vector/matrix, make a routine norm_vector(v,p), which 
computes the norm of a given vector as 


IMI„ = 


’Ew 


(PI.13.1) 


for any positive integer p, finds the maximum absolute value of the elements 
for p = inf and computes the norm as if p = 2, even if the second input 
argument p is not given. If you have no idea, permutate the statements in the 
below box and save it in the file named “norm_vector. m”. Additionally, try 
it to get the norm with p = 1,2,oo(inf) and of an arbitrary vector generated 
by the command rand (2,1). Compare the result with that obtained by using 
the norm() command. 


function nv = norm_vector(v,p) 
if nargin <2, p = 2; end 
nv = sum(abs(v) , A p) A (1/p); 
nv = max(abs(v)); 
if p > 0 & p -= inf 
elseif p == inf 
end 
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1.14 Backslash(\) Operator 

Let’s play with the backslash(\) operator. 

(a) Use the backslash(\) command, the minimum-norm solution (2.1.7) and 
the pinv() command to solve the following equations, find the residual 
error ||A,-x — b, ||’s and the rank of the coefficient matrix A,-, and fill in 
Table PI. 14 with the results. 



(PI.14.1) 


(ii) A 2 X 


2 

4 




(PI. 14.2) 



(PI.14.3) 


Table PI.14 Results of Operations with backslash (\) Operator and pinv( ) Command 



the parentheses (). 
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(b) Use the backslash (\) command, the LS (least-squares) solution (2.1.10) 
and the pinv() command to solve the following equations and find the 
residual error ||A,-x — b, ||’s and the rank of the coefficient matrix A h 
and fill in Table PI. 14 with the results. 


(i) A 4 x = 


1 

2 

3 




(ii) A 5 x = 


1 

2 

3 




(PI.14.4) 


(PI.14.5) 


(iii) A 6 x = 


1 

2 

3 




(PI.14.6) 


(cf) If some or all of the rows of the coefficient matrix A in a set of linear equations 
can be expressed as a linear combination of other row(s), the corresponding 
equations are dependent, which can be revealed by the rank deficiency, that is, 
rank(A) < min(M, N) where M and N are the row dimension and the column 
dimension, respectively. If some equations are dependent, they may have either 
inconsistency (no exact solution) or redundancy (infinitely many solutions), 
which can be distinguished by checking if augmenting the RHS vector b to the 
coefficient matrix A increases the rank or not—that is, rank([A b]) > rank(A) 
or not [M-2], 

(c) Based on the results obtained in (a) and (b) and listed in Table PI. 14, 
answer the following questions. 

(i) Based on the results obtained in (a)(i), which one yielded the 
non-minimum-norm solution among the three methods, that is, 
the backslash(\) operator, the minimum-norm solution (2.1.7) and 
the pinv() command? Note that the minimum-norm solution 
means the solution whose norm (||x||) is the minimum over the 
many solutions. 

(ii) Based on the results obtained in (a), which one is most reliable 
as a means of finding the minimum-norm solution among the 
three methods? 

(iii) Based on the results obtained in (b), choose two reliable methods 
as a means of finding the LS (least-squares) solution among the 
three methods, that is, the backslash (\) operator, the LS solu¬ 
tion (2.1.10) and the pinv() command. Note that the LS solution 
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means the solution for which the residual error (|| Ax — b||) is the 
minimum over the many solutions. 

1.15 Operations on Vectors 

(a) Find the mathematical expression for the computation to be done by 
the following MATLAB statements. 

»n = 0:100; S = sum(2. A -n) 

(b) Write a MATLAB statement that performs the following computation. 



(c) Write a MATLAB statement which uses the commands prod() and 
sum() to compute the product of the sums of each row of a 3 x 3 
random matrix. 

(d) How does the following MATLAB routine “repetition(x,M,m)” con¬ 
vert a given row vector sequence x to make a new sequence y ? 


function y = repetition^,M,m) 
if m == 1 

MNx = ones(M,1)*x; y = MNx(:)'; 
else 

Nx = length(x); N = ceil(Nx/m); 

x = [x zeros(1,N*m - Nx)]; 

MNx = ones(M,1)*x; 

y = Mi 

for n = 1:N 

tmp = MNx(:,(n - 1)*m + [1:m]).'; 

y = [y tmp( 

end 

end 


(e) Make a MATLAB routine “zero_insertion(x,M,m)”, which inserts 
m zeros just after every Mth element of a given row vector sequence 
x to make a new sequence. Write a MATLAB statement to apply the 
routine for inserting two zeros just after every third element of x = 
[1 3 7 2 4 9] to get 

y = [ 1 37002490 0] 

(f) How does the following MATLAB routine “zeroing(x,M,m)” convert 
a given row vector sequence x to make a new sequence y? 
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function y = zeroing(x,M,m) 

%zero out every (kM - m)th element 
if nargin <3, m = 0; end 
if M<=0, M = 1; end 
m = mod(m,M); 

Nx = length(x); N = floor(Nx/M); 
y = x; y(M*[ 1 :N] - m) = 0; 


(g) Make a MATLAB routine “sampling (x,M,m)”, which samples every 

(kM - m)th element of a given row vector sequence x to make a new 
sequence. Write a MATLAB statement to apply the routine for sampling 
every ( 3k — 2)th element of x — [ 1 3 7 2 4 9 ] to get 

y = [1 2 ] 

(h) Make a MATLAB routine ‘rotation_r(x,M)”, which rotates a given 
row vector sequence x right by M samples, say, making rotate_r( [1 
2 3 4 5],3) = [3 4 5 1 2], 

1.16 Distribution of a Random Variable: Histogram 

Make a routine randu (N, a, b), which uses the MATLAB function rand () 
to generate an N-dimensional random vector having the uniform distribution 
over [a, b] and depicts the graph for the distribution of the elements of 
the generated vector in the form of histogram divided into 20 sections as 
Fig. 1.7. Then, see what you get by typing the following statement into the 
MATLAB command window. 

»randu(1000, -2,2) 

What is the height of the histogram on the average? 

1.17 Number Representation 

In Section 1.2.1, we looked over how a number is represented in 64 bits. 
For example, the IEEE 64-bit floating-point number system represents the 
number 3(2' < 3 < 2 2 ) belonging to the range R\ = [2 1 ,2 2 ) with E = 1 as 

| 0 1100 0000 0000 11000 0000 0000 . 0000 0000 0000 0000 0000 I 

4 0 08 0 0 . 0 0 0 0 0 

where the exponent and the mantissa are 

Exp = E + 1023 = 1 + 1023 = 1024 = 2 10 = 100 0000 0000 
M = (3 x 2 -e — 1) x 2 52 = 2 51 


= 1000 0000 0000 .... 0000 0000 0000 0000 0000 
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This can be confirmed by typing the following statement into MATLAB 
command window. 

»fprintf('3 = %bx\n',3) or »format hex, 3, format short 

which will print out onto the screen 

0000000000000840 4008000000000000 

Noting that more significant byte (8[bits] = 2[hexadecimal digits]) of a 
number is stored in the memory of higher address number in the INTEL 
system, we can reverse the order of the bytes in this number to see the 
number having the most/least significant byte on the left/right side as we 
can see in the daily life. 

00 00 00 00 00 00 08 40 -* 40 08 00 00 00 00 00 00 

This is exactly the hexadecimal representation of the number 3 as we 
expected. You can find the IEEE 64-bit floating-point number represen¬ 
tation of the number 14 and use the command fprintf() or format hex to 
check if the result is right. 


procedure of subtracting 2 1 from 2 3 > 



Figure PI .18 Procedure of addition/subtraction with four mantissa bits. 


1.18 Resolution of Number Representation and Quantization Error 

In Section 1.2.1, we have seen that adding 2 -22 to 2 30 makes some dif¬ 
ference, while adding 2 -23 to 2 30 makes no difference due to the bit shift 
by over 52 bits for alignment before addition. How about subtracting 2“ 23 
from 2 30 ? In contrast with the addition of 2~ 23 to 2 30 , it makes a differ¬ 
ence as you can see by typing the following statement into the MATLAB 
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command window. 

»x = 2*30; x + 2* - 23 == x, x - 2* - 23 == x 

which will give you the logical answer 1 (true) and 0 (false). Justify this 
result based on the difference of resolution of two ranges [ 2 30 , 2 31 ) and [ 2 29 , 
2 30 ) to which the true values of computational results ( 2 30 + 2 -23 ) and ( 2 30 — 
2 -23 ) belong, respectively. Note from Eq. (1.2.5) that the resolutions—that is, 
the maximum quantization errors—are A E = 2 E ~ 52 = 2 -52+30 = 2~ 22 and 
2-52+29 _ 2 -23 , respectively. For details, refer to Fig. PI. 18, which illustrates 
the procedure of addition/subtraction with four mantissa bits, one hidden bit, 
and one guard bit. 

1.19 Resolution of Number Representation and Quantization Error 

(a) What is the result of typing the following statements into the MATFAB 
command window? 

>>7/100*100 - 7 

How do you compare the absolute value of this answer with the reso¬ 
lution A of the range to which 7 belongs? 

(b) Find how many numbers are susceptible to this kind of quantization 
error caused by division/multiplication by 100 , among the numbers 
from 1 to 31. 

(c) What will be the result of running the following program? Why? 


%nm1p19: Quantization Error 
x = 2-2*-50; 
for n = 1:2*3 

x = x+2*-52; fprintf('%20.18E\n',x) 
end 


1.20 Avoiding Farge Errors/Overflow/Underflow 

(a) For x = 9.8 201 and y = 10.2 199 , evaluate the following two expressions 
that are mathematically equivalent and tell which is better in terms of 
the power of resisting the overflow. 

(i) z = yj x 2 + y 2 (PI.20.la) 


(ii) z = yy/{x/y) 2 + 1 (PI.20.lb) 

Also for x = 9.8 -201 and y = 10.2 — 1 ", evaluate the above two expres¬ 
sions and tell which is better in terms of the power of resisting the 
underflow. 

(b) With a = c = 1 and for 100 values of b over the interval [10 7 4 , I0 85 J 
generated by the MATFAB command ‘logspace(7.4,8.5,100)’, 
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evaluate the following two formulas (for the roots of a quadratic 
equation) that are mathematically equivalent and plot the values of the 
second root of each pair. Noting that the true values are not available 
and so the shape of solution graph is only one practical basis on which 
we can assess the quality of numerical solutions, tell which is better in 
terms of resisting the loss of significance. 

(i) ^xi,x 2 = ^~(—b =F sign (bWb 2 - 4ac)j (P1.20.2a) 

(ii) [* = ^-(-b — sign (b)y/b 2 - 4 ac), x 2 = (P1.20.2b) 


(c) For 100 values of x over the interval [10 14 , 10 16 ], evaluate the follow¬ 
ing two expressions that are mathematically equivalent, plot them, and 
based on the graphs, tell which is better in terms of resisting the loss 
of significance. 


(i) y = 

(ii) y = 


V2x 2 +1-1 
2x 2 


(PI.20.3a) 
(PI.20.3b) 


(d) For 100 values of x over the interval [10 -9 , 10 _7 4 J, evaluate the fol¬ 
lowing two expressions that are mathematically equivalent, plot them, 
and based on the graphs, tell which is better in terms of resisting the 
loss of significance. 


(i) >' = 

(ii) y = 


(PI.20.4a) 
(PI.20.4b) 


(e) On purpose to find the value of (300 I25 / I25!)e 30 °, type the following 
statement into the MATLAB command window. 


»30(T 125/prod ([1:125]) *exp (-300) 

What is the result? Is it of any help to change the order of multipli¬ 
cation/division? As an alternative, make a routine which evaluates the 
expression 


p(k) = —e A for A. = 300 and an integer k (PI.20.5) 

k\ 

in a recursive way, say, like p(k + 1) = p(k) *X/k and then, use the 
routine to find the value of (300 125 /125!)rr 30 °. 
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(f) Make a routine which computes the sum 
K X k 

S(K ) = ^2 —e ^ for A. = 100 and an integer K (Pl.20.6) 


and then, use the routine to find the value of 5(155). 
1.21 Recursive Routines for Efficient Computation 
(a) The Hermite Polynomial [K-l] 

Consider the Hermite polynomial defined as 


H 0 (x) = ti 


H n (x) = (-l)V 


d N 

dx * 6 


(PI.21.1) 


(i) Show that the derivative of this polynomial function can be writ¬ 
ten as 


= 2 xH N {x) - H n+1 (x) 


and so the (N + l)th-degree Hermite polynomial can be obtained 
recursively from the A^th-degree Hermite polynomial as 


H N+l {x) = 2xH N {x)-H' N {x) (Pl.21.3) 

(ii) Make a MATLAB routine “Hermitp(N)” which uses Eq. (Pl.21.3) 
to generate the /Vth-dcgrcc Hermite polynomial H N {x). 

(b) The Bessel Function of the First Kind [K-l] 

Consider the Bessel function of the first kind of order k defined as 


MP) = 


Tt Jo 


cos (kS — P sin S)dS 


(P1.21.4a) 


(p\ k Y- (-i rp 2m 

V 2 / 4 m m!(m + k)\ 


m (-1 ) k J- k (p) (Pi.21.4b) 


(i) Define the integrand of (PI.21.4a) in the name of ‘Bessel_inte- 
grand(x,beta,k)’ and store it in an M-file named “Bessel_ 
integrand.m”. 

(ii) Complete the following routine “Jkb(K,beta)”, which uses 
(PI.21.4b) in a recursive way to compute Jk(P) of order k = 
1 :K for given K and P (beta). 

(iii) Run the following program nm1p21b which uses Eqs. (PI.21.4a) 
and (P1.21.4b) to get J\s(p) for p = 0:0.05:15. What is the norm 
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of the difference between the two results? How do you compare 
the running times of the two methods? 

(cf) Note that Jkb(K,beta) computes Jk(fi) of order k = 1:K, while the inte¬ 
gration does for only k = K. 


function [J,JJ] = Jkb(K,beta) %the 1st kind of kth-order Bessel ftn 
tmpk = ones(size(beta)); 
fon k = 0: K 

tmp = tmpk; JJ(k + 1,:) = tmp; 
for m = 1:100 

tmp = ?????????????????????; 

JJ(k + 1,:) = JJ(k + 1,:)+ tmp; 
if norm(tmp)<.001, break; end 
end 

tmpk = tmpk.*beta/2/(k +1); 
end 

J = JJ(K+1,:); _ 

%nm1p21b: Bessel_ftn 
clear, elf 

beta = 0:.05:15; K = 15; 
tic 

for i = 1:length(beta) %Integration 

J151(i) = quad('Bessel_integrand',0,pi,[],0,beta(i),K)/pi; 
end 
toe 

tic, J152 = Jkb(K,beta); toe %Recursive Computation 
discrepancy = norm(J151-J152) 


1.22 Find the four routines in Chapter 5 and 7, which are fabricated in a nested 
(recursive calling) structure. 

(cf) Don’t those algorithms, which are the souls of the routines, seem to have been 
bom to be in a nested structure? 

1.23 Avoiding Runtime Error in Case of Deficient/Nonadmissible Input Argu¬ 
ments 

(a) Consider the MATLAB routine “rotation_r(x,M)”, which you made 
in Problem 1.15(h). Does it work somehow when the user gives a 
negative integer as the second input argument M ? If not, add a statement 
so that it performs the rotation left by — M samples for M < 0, say, 
making 

rotate_r([1 2 3 4 5],-2) = [3451 2] 

(b) Consider the routine ‘trpzds(f ,a,b,N)’ in Section 5.6, which com¬ 
putes the integral of function f over [a, b] by dividing the integration 
interval into N sections and applying the trapezoidal rule. If the user 
tries to use it without the fourth input argument N, will it work? If not, 
make it work with N = 1000 by default even without the fourth input 
argument N. 
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function INTf = trpzds(f,a,b,N) 

%integral of f(x) over [a,b] by trapezoidal rule with N segments 

if abs(b - a) < eps | N <= 0, INTf = 0; return; end 

h = (b - a)/N; x = a+[0:N]*h; 

fx = feval(f,x); %values of f for all nodes 

INTf = h*((fx(1)+ fx(N + 1))/2 + sum(fx(2:N))); %Eq.(5.6.1) 


1.24 Parameter Passing through varargin 

Consider the integration routine ‘trpzds(f ,a,b,N)’ in Section 5.6. Can 
you apply it to compute the integral of a function with some parame¬ 
ters), like the ‘Bessel_integrand(x,beta,k)’ that you defined in Prob¬ 
lem 1.21? If not, modify it so that it works for a function with some param¬ 
eters) (see Section 1.3.6) and save it in the M-file named ‘trpzds_par .m’. 
Then replace the ‘quad () ’ statement in the program ‘nml p21 b’ (introduced 
in PI.21) by an appropriate ‘trpzds_par () ’ statement (with N = 1000) and 
run the program. What is the discrepancy between the integration results 
obtained by this routine and the recursive computation based on Problem 
1.21.4(b)? Is it comparable with that obtained with ‘quad () ’ ? How do you 
compare the running time of this routine with that of *quad()’? Why do 
you think it takes so much time to execute the ‘quad () ’ routine? 

1.25 Adaptive Input Argument to Avoid Runtime Error in the Case of Different 
Input Arguments 

Consider the integration routine ‘t rpzds (f, a, b, N) ’ in Section 5.6. If some 
user tries to use this routine with the following statement, will it 
work? 

trpzds(f,[a b],N) or trpzds(f,[a b]) 

If not, modify it so that it works for such a usage (with a bound vector as 
the second input argument) as well as for the standard usage and save it in 
the M-file named ‘trpzds_bnd.m’. Then try it to find the intergal of e~ l 
for [0,100] by typing the following statements in the MATLAB command 
window. What did you get? 

»ftn=inline( 1 exp(-t) 1 , 1 1 1 ); 

»trpzds_bnd(ftn,[0 100],1000) 

»trpzds_bnd(ftn,[0 100]) 

1.26 CtFT(Continuous-Time Fourier Transform) of an Arbitrary Signal 
Consider the following definitions of CtFT and ICtFT(Inverse CtFT) [W-4]: 

X(a>) = F{x{t)} = J x(t)e~ ja,t dt: CtFT (P1.26.1a) 

x(t) = F- 1 {X(o))} = — [ X{co)e ja>t do)\ ICtFT (P1.26.1b) 

2ir J_ 00 
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(a) Similarly to the MATLAB routine “CtFTI (x,Dt,w)” computing the 
CtFT (P1.26.1a) of x(t) over [-Dt,Dt ] for w, make a MATLAB rou¬ 
tine “ICtFTI (X,Bw, t)” computing the ICtFT (P1.26.1b) of X(w) over 
[-Bw, Bw] for t. You can choose whatever integral routine including 
‘trpzds_par () ’ (Problem 1.24) and ‘quad()\ considering the running 
time. 

(b) The following program ‘nm1p26’ finds the CtFT of a rectangular pulse 
(with duration [—1,1]) defined by ‘rDt () ’ for co = [—67T, +6ttJ and the 
ICtFT of a sine spectrum (with bandwidth 2n) defined by ‘sincBw() ’ 
for t = [—5, +5]. After having saved the routines into M-files with the 
appropriate names, run the program to see the rectangular pulse, its 
CtFT spectrum, a sine spectrum, and its ICtFT. If it doesen’t work, 
modify/supplement the routines so that you can rerun it to see the 
signals and their spectra. 


function Xw = CtFTI(x,Dt,w) 

x_ejkwt = inline([x '(t).*exp(-j*w*t)'],'t','w'); 
Xw = trpzds_par(x_e]kwt,-Dt,Dt,1000,w); 

%Xw = quad (x ejkwt,-Dt,Dt,[],0,w); 
function xt = ICtFTI(X,Bw,t) 


function x = rDt(t) 
x = (- D/2 <= t & t <= D/2) ; 
function X = sincBw(w) 

X = 2*pi/B*sinc(w/B) ; 

%nm1p26: CtFT and ICtFT 
clear, elf 
global B D 

%CtFT of a Rectangular Pulse Function 

t = [-50:50]/10; %time vector 

w = [-60:60]/10*pi; %frequency vector 

0=1; %Duration of a rectangular pulse rD(t) 

for k = 1:length(w), Xw(k) = CtFTI('rDt',D*5,w(k)); end 

subplot(221), plot(t,rDt(t)) 

subplot(222), plot(w,abs(Xw)) 

%ICtFT of a Sine Spectrum 

B = 2*pi; %Bandwidth of a sine spectrum sncB(w) 

for n = 1:length(t), xt(n) = ICtFTI('sincBw',B*5,t(n)); 

end 

subplot(223), plot(t,real(xt)) 
subplot(224), plot(w,sincBw(w)) 
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SYSTEM OF LINEAR 
EQUATIONS 


In this chapter, we deal with several numerical schemes for solving a system of 
equations 

anX\ + anX2 + ■ ■ ■ + fli n x n = b\ 
a 2 ix x + CI22X2 H-h a 2 N x N = b 2 

(2.0.1a) 


aMl x \ + ®M2 X 2 + • ‘ ‘ + ClMN x N = bltf 

which can be written in a compact form by using a matrix-vector notation as 
Amx N x = b (2.0.1b) 


where 



We will deal with the three cases: 


(i) The case where the number (M) of equations and the number (N) of 
unknowns are equal (M = N) so that the coefficient matrix A MxN is 
square. 

Applied Numerical Methods Using MATLAB ®, by Yang, Cao, Chung, and Morris 
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(ii) The case where the number ( M) of equations is smaller than the number 
(IV) of unknowns (M < N) so that we might have to find the minimum- 
norm solution among the numerous solutions. 

(iii) The case where the number of equations is greater than the number of 
unknowns (M > N) so that there might exist no exact solution and we 
must find a solution based on global error minimization, like the “LSE 
(Least-squares error) solution.” 


2.1 SOLUTION FOR A SYSTEM OF LINEAR EQUATIONS 

2.1.1 The Nonsingular Case (M = N) 

If the number (M) of equations and the number (N) of unknowns are equal 
( M = N ), then the coefficient matrix A is square so that the solution can be 
written as 

x = A~ 1 b (2.1.1) 

so long as the matrix A is not singular. There are MATLAB commands for 
this job. 

»A = [1 2;3 4]; b = [ -1; -1 ]; 

»x = A~-1*b %or, x = inv(A)*b 
x = 1.0000 
-1.0000 

What if A is square, but singular? 

»A = [1 2; 2 4]; b = [ -1;-1 ]; 

»x = A A -l*b 

Warning: Matrix is singular to working precision, 
x = -Inf 
-Inf 

This is the case where some or all of the rows of the coefficient matrix A are 
dependent on other rows and so the rank of A is deficient, which implies that 
there are some equations equivalent to or inconsistent with other equations. If 
we remove the dependent rows until all the (remaining) rows are independent of 
each other so that A has full rank (equal to M), it leads to the case of M < N, 
which will be dealt with in the next section. 

2.1.2 The Underdetermined Case (M < N ): Minimum-Norm Solution 

If the number (M) of equations is less than the number ( N ) of unknowns, the 
solution is not unique, but numerous. Suppose the M rows of the coefficient 
matrix A are independent. Then, any ^-dimensional vector can be decomposed 
into two components 


x = x 


(2.1.2) 
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where the one is in the row space 1Z(A) of A that can be expressed as a linear 
combination of the M row vectors 


x+ = A t a (2.1.3) 

and the other is in the null space Af(A) orthogonal(perpendicular) to the row 
space 1 so that 

Ax - = 0 (2.1.4) 

Substituting the arbitrary /V-dimcnsional vector representation (2.1.2) into 
Eq. (2.0.1) yields 

A(x + + x - ) = AA r a + Ax” (2 = 4) AA r a = b (2.1.5) 

Since AA T is supposedly a nonsingular M x M matrix resulting from multiplying 
an M x IV matrix by an N x M matrix, we can solve this equation for a to get 

a° = [AA T r‘b (2.1.6) 

Then, substituting Eq. (2.1.6) into Eq. (2.1.3) yields 

X ° + (2 = 3) (2 = 6) A r [AA r ]-‘b (2.1.7) 

This satisfies Eq. (2.0.1) and thus qualifies as its solution. However, it is far 
from being a unique solution because the addition of any vector x (in the 
null space) satisfying Eq. (2.1.4) to x 0+ still satisfies Eq. (2.0.1) [as seen from 
Eq. (2.1.5)], yielding infinitely many solutions. 

Based on the principle that any one of the two perpendicular legs is shorter 
than the hypotenuse in a right-angled triangle, Eq. (2.1.7) is believed to represent 
the minimum-norm solution. Note that the matrix A r [AA r ] -1 is called the right 
pseudo- (generalized) inverse of A (see item 2 in Remark 1.1). 

MATLAB has the pinv() command for obtaining the pseudo-inverse. We 
can use this command or the slash(/) operator to find the minimum-norm solu¬ 
tion (2.1.7) to the system of linear equations (2.0.1). 

»A = [1 2]; b = 3; 

»x = pinv(A)*b %x = A 1 *(A*A')" - 1*b or eye(size(A,2))/A*b, equivalently 
x = 0.6000 
1 .2000 

Remark 2.1. Projection Operator and Minimum-Norm Solution 

1. The solution (2.1.7) can be viewed as the projection of an arbitrary solution 
x° onto the row space TZ(A) of the coefficient matrix A spanned by the 


See the website @http://www.psc.edu/~burkardt/papers/linear_glossary.html 
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row vectors. The remaining component of the solution x° 

x°- = x° - x 0+ = x° - A r [AA r ]~ l b = x° - A T [AA r ]- 1 Ax° 

= [I - A r [AA r ]- l A]x 0 

is in the null space Af(A), since it satisfies Eq. (2.1.4). Note that 
P A = [I -A t [AA t T 1 A\ 
is called the projection operator. 

2. The solution (2.1.7) can be obtained by applying the Lagrange multiplier 
method (Section 7.2.1) to the constrained optimization problem in which 
we must find a vector x minimizing the (squared) norm ||x|| 2 subject to the 
equality constraint Ax = b. 

Min l(x,X) Eq ' ( = 2 ' 2) i||x|| 2 — X T (Ax — b) = ^x T x — X T (Ax — b) 

By using Eq. (7.2.3), we get 

= x — A T X = 0; x=A r X = A T [AA r r 1 b 
dx 

-?-./ = Ax-b = 0; AA t X — b; X = [AA r ] -1 b 
oA. 

Example 2.1. Minimum-Norm Solution. Consider the problem of solving the 
equation 


[1 2]^J=3; Ax = b, where A = [ 1 2], b = 3 (E2.1.1) 

This has infinitely many solutions and any x—[x\ xi\ T satisfying this 
equation, or, equivalently, 

xi + 2x 2 = 3; x 2 = ~X\ + ^ (E2.1.2) 

is a qualified solution. Equation (E2.1.2) describes the solution space as depicted 
in Fig. 2.1. 

On the other hand, any vector in the row space of the coefficient matrix A 
can be expressed by Eq. (2.1.3) as 



x + = A J 


(a is a scalar, since M = 1) 


(E2.1.3) 
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and any vector in the null space of A can be expressed by Eq. (2.1.4) as 

Ax- = [1 2][*L]=0; x~ = - ] -x; (E2.1.4) 

We use Eq. (2.1.7) to obtain the minimum-norm solution 

x-= a wr'b =[‘](u 2 ][i]) 3 =l[2]=[?:l] <e2X5) 

Note from Fig. 2.1 that the minimum-norm solution x° + is the intersection of 
the solution space and the row space and is the closest to the origin among the 
vectors in the solution space. 

2.1.3 The Overdetermined Case (M > N): LSE Solution 

If the number (M) of (independent) equations is greater than the number (N) 
of unknowns, there exists no solution satisfying all the equations strictly. Thus 
we try to find the LSE (least-squares error) solution minimizing the norm of the 
(inevitable) error vector 

e = Ax-b (2.1.8) 

Then, our problem is to minimize the objective function 


J = i||e|| 2 = A|| Ax — b|| 2 = ±[Ax-b] r [Ax-b] 


(2.1.9) 
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whose solution can be obtained by setting the derivative of this function (2.1.9) 
with respect to x to zero. 

— J = A 7 ’[Ax-b] = 0; x° = [A t A]- 1 A t b (2.1.10) 

9x 

Note that the matrix A having the number of rows greater than the number of 
columns (M > N) does not have its inverse, but has its left pseudo (generalized) 
inverse [A r A\~ l A T as long as A is not rank-deficient—that is, all of its columns 
are independent of each other (see item 2 in Remark 1.1). The left pseudo-inverse 
matrix can be computed by using the MATLAB command pinv(). 

The LSE solution (2.1.10) can be obtained by using the pinv() command or 
the backslash (\) operator. 

»A = [1; 2]; b = [2.1; 3.9]; 

»x = pinv(A)*b %A\b or x = (A' *A) A -1*A'*b, equivalently 
x = 1.9800 


function x = lin_eq(A,B) 

%This function finds the solution to Ax = B 
[M,N] = size(A); 
if size(B,1) -= M 

error( 1 Incompatible dimension of A and B in lin_eq()!') 
end 

if M == N, X = A~-1*B; %x = inv(A)*B or gaussj(A,B); %Eq.(2.1.1) 
elseif M < N %Minimum-norm solution (2.1.7) 

X = pinv(A)*B; %A 1 *(A*A 1 ) A -1*B; or eye(size(A,2))/A*B 
else %LSE solution (2.1.10) for M > N 

x = pinv(A)*B; %(A'*A) A -1*A'*B or x = A\B 
end 


The above MATLAB routine lin_eq () is designed to solve a given set of 
equations, covering all of the three cases in Sections 2.1.1, 2.1.2, and 2.1.3. 

(cf) The power of the pinv() command is beyond our imagination as you might have 
felt in Problem 1.14. Even in the case of M < N, it finds us a LS solution if the 
equations are inconsistent. Even in the case of M > N, it finds us a minimum-norm 
solution if the equations are redundant. Actually, the three cases can be dealt with 
by a single pinv () command in the above routine. 


2.1.4 RLSE (Recursive Least-Squares Estimation) 

In this section we will see the so-called RLSE (Recursive Least-Squares Esti¬ 
mation) algorithm, which is a recursive method to compute the LSE solution. 
Suppose we know the theoretical relationship between the temperature t [°J and 
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the resistance /?[£2] of a resistor as 


and we have lots of experimental data {(q, R\ ), (q, R 2 ), ■ ■ ■, (4, Rk)} collected 
up to time k. Since the above equation cannot be satisfied for all the data with any 
value of the parameters ci and c-i, we should try to get the parameter estimates 
that are optimal in some sense. This corresponds to the overdetermined case dealt 
with in the previous section and can be formulated as an LSE problem that we 
must solve a set of linear equations 


A k x k b k . 


where A k 


'4 r 


~Ri~ 

4 1 

, * k = [ Cl,k 1 , and b k = 

Ri 

j k i_ 

l c 2,kj 

R k 


Ini' which we can apply Eq. (2.1.10) to get the solution as 


x* = [A T k A k r l A T k b k 


( 2 . 1 . 11 ) 


Now, we are given a new experimental data ( 4 + 1 , / 4 + 1 ) and must find the 
new parameter estimate 

x*+i = l[A| +1 A* +1 ]- 1 A[ + ib* + i (2.1.12) 


with 



' h r 



' Ri ' 

Ajt+i = 

4 1 

_ tk + i 1 _ 

’ Xit+1 = [c2i+! 

, and bfc+if = 

R k 

R k+ 1 


How do we compute this? If we discard the previous estimate x k and make direct 
use of Eq. (2.1.12) to compute the next estimate x^+i every time a new data pair 
is available, the size of matrix A will get bigger and bigger as the data pile up, 
eventually defying any powerful computer in this world. 

How about updating the previous estimate by just adding the correction term 
based on the new data to get the new estimate? This is the basic idea of the 
RLSE algorithm, which we are going to trace and try to understand. In order to 
do so, let us define the notations 

Afc+1 = [afJ’ a - = ['T]’ b,t+1 = [ R k+ i ] ’ 311(1 = 

(2.1.13) 
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and see how the inverse matrix P k is to be updated on arrival of the new data 

(4+1, Rk+ 1). 


= [4+i A k+1 r l = [lA T k a k+ \ ] ]] 


= [AlA k + a,+,a[ +l ]-' = [P k l + a, +l a[ +l ] 1 

(2.1.14) 

(Matrix Inversion Lemma in Appendix B) 


= Pk - Pk&k+\ 4 + | P k a k+ \ + 1J 'a[+i P k 

(2.1.15) 


It is interesting that [a^ +l P k a k+ \ + 1] is nothing but a scalar and so we do 
not need to compute the matrix inverse thanks to the Matrix Inversion Lemma 
(Appendix B). It is much better in the computational aspect to use the recursive 
formula (2.1.15) than to compute [A k+l A k+ \\~ x directly. We can also write Eq. 
( 2 . 1 . 12 ) in a recursive form as 

***“> b, + , <2 i UI W*! ■*»][£,] 

= Pk+AA T k b k + a^+i^+i] (2 = U) P k +i[A T k A k x k + a k+l R k+1 ] 

( 2 .i_i 3 ) 1 [(Aj + 1 A fc+1 - a k+ ia[ +1 )x k + a k+ iR k+ i] 

(2 = 13) P k+1 [P k ^x k - a k+i a[ +1 x k + a k+ iR k+ i] 

X L +1 = X/t + Pk+l&k+l(Rk+l ~ a[ +l x,) (2.1.16) 

We can use Eq. (2.1.15) to rewrite the gain matrix P k+ 1 a* ( i premultiplied by 
the ‘error’ to make the correction term on the right-hand side of Eq. (2.1.16) as 

Kk+\ = Pk+\<kk 11 * = ) [Pk ~ Pk&k+I LaJ f | Pk&k+\ + l] _ 1 aj+i Pk\&k+\ 

= P k a k+ \[I - [a[ + , P k a k+ \ + l]“'a[ +l P k a k{i ] 

= Pk&k+\[& k+ \Pk&k+i + l] -1 {[aj + i P k a k +i + 1] — al +l P k a k+1 ] 

K k+ i = P k a k +i [aj +1 P k a k+ \ + l]- 1 (2.1.17) 

and substitute this back into Eq. (2.1.15) to write it as 

P k + i = Pk~ K k+l a T k+l P k (2.1.18) 

The following MATLAB routine “rlse_online( )” implements this RLSE 
(Recursive Least-Squares Estimation) algorithm that updates the parameter 
estimates by using Eqs. (2.1.17), (2.1.16), and (2.1.18). The MATLAB program 
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“do_nlse.m” updates the parameter estimates every time new data arrive and 
compares the results of the on-line processing with those obtained by the off-line 
(batch job) processing—that is, by using Eq.(2.1.12) directly. Noting that 

• the matrix [A T k A k \ as well as consists of information and is a kind of 
squared matrix that is nonnegative, and 

• [Aj A*] will get larger, or, equivalently, P k = [A T k A^] -1 will get smaller and, 
consequently, the gain matrix K k will get smaller as valuable information 
data accumulate, 

one could understand that P k is initialized to a very large identity matrix, since 
no information is available in the beginning. Since a large/small P k makes the 
correction term on the right-hand side of Eq. (2.1.16) large/small, the RLSE 
algorithm becomes more conservative and reluctant to learn from the new data 
as the data pile up, while it is willing to make use of the new data for updating 
the estimates when it is hungry for information in the beginning. 


function [x,K,P] = rlse_online(aT_k1,b_k1,x,P) 

K = P*aT_k1 1 /(aT_k1*P*aT_k1 1 +1); %Eq.(2.1.17) 
x = x +K*(b_k1-aT_k1*x); %Eq.(2.1.16) 

P = P-K*aT_k1*P; %Eq.(2.1.18) _ 

%do_nlse 

clean 

xo = [2 1]'; %The true value of unknown coefficient vector 

NA = length(xo); 

x = zeros(NA,1); P = 100*eye(NA,NA); 
for k = 1:100 

A(k,:) = [k*0.01 1]; 

b(k,:) = A(k,:)*xo +0.2*rand; 

[x,K,P] = rlse_online(A(k,:),b(k,:),x,P); 
end 

x % the final parameter estimate 

A\b % for comparison with the off-line processing (batch job) 


2.2 SOLVING A SYSTEM OF LINEAR EQUATIONS 
2.2.16 Gauss Elimination 

For simplicity, we assume that the coefficient matrix A in Eq. (2.0.1) is a non¬ 
singular 3x3 matrix with M = N = 3. Then we can write the equation as 


an*i + anxi + 013*3 = h 
021*1 + 022*2 + 023*3 = &2 
O31X1 + 0 3 2*2 + 033*3 = &3 


(2.2.0a) 

(2.2.0b) 

(2.2.0c) 
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First, to remove the xi terms from equations (2.2.0 .m) other than (2.2.0.a), we 
subtract (2.2.0a) xa m \/a\\ from each of them to get 


a^x 1 +a^x 2 + a^x 3 = bf ) 

(2.2.1a) 

«22 >x 2 + a^x-i = 

(2.2.1b) 

a 32 x 2 + a&xs = b^ 

(2.2.1c) 

with 


a® = a mn , = b,„ for m,n = 1, 2, 3 

(2.2.2a) 

a<j> = a® - b™ = - {a^/af^b™ for m, n = 2, 3 

(2.2.2b) 

We call this work ‘pivoting at an’ and call the center element a n a ‘pivot’. 

Next, to remove the *2 term from Eq. (2.2.1c) other than (2.2.1 a,b), we sub¬ 
tract (2.2.1 b)xa ^'2 /a^(m = 3) from it to get 

(0) , (0) . (0) , (0) 
a\ 3 x i + a j2 x 2 + a\ 3 x 3 = b\ 

(2.2.3a) 

X2 + Cl2 3 X 3 = b j 1 * 

(2.2.3b) 

a^x 3 =bf 

(2.2.3c) 

with 


«£ = <1 - (4MM’. e = *>£> - 

for m, n = 3 
(2.2.4) 


We call this procedure ‘Gauss forward elimination’ and can generalize the updat¬ 
ing formula (2.2.2)/(2.2.4) as 


a-Vn = <*mn l) ~ (ai\ _1 7aa“ V** -0 for m, n = k + 1, k + 2,. .., M (2.2.5a) 
b£> = b%~ l) - (a£ t _1) /a£ _1) )fcf _1) for m = k *1 k + 2, ..., M (2.2.5b) 

After having the triangular matrix-vector equation as Eq. (2.2.3), we can solve 
Eq. (2.2.3c) first to get 

*3 = bf/afl (2.2.6a) 

and then substitute this result into Eq. (2.2.3b) to get 

* 2 = (b^ - a$x 3 )/a% (2.2.6b) 

Successively, we substitute Eqs. (2.2.6a,b) into Eq.(2.2.3a) to get 



(2.2.6c) 
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We call this procedure ‘backward substitution’ and can generalize the solution 
formula (2.2.6) as 

b m~ X) ~ X! / fl ™ X) for m = M, M — 1.1 

(2.2.7) 

In this way, the Gauss elimination procedure consists of two steps, namely, 
forward elimination and backward substitution. Noting that 

• this procedure has nothing to do with the specific values of the unknown 
variable x m ’s and involves only the coefficients, and 

• the formulas (2.2.5a) on the coefficient matrix A and (2.2.5b) on the RHS 
(right-hand side) vector b conform with each other, 

we will augment A with b and put the formulas (2.2.5a,b) together into one 
framework when programming the Gauss forward elimination procedure. 



2.2.2 Partial Pivoting 

The core formula (2.2.5) used for Gauss elimination requires division by a^ _1) 
at the kth stage, where is the diagonal element in the kth row. What if 

af k V> = 0? In such a case, it is customary to switch the kth row and another row 
below it having the element of the largest absolute value in the kth column. This 
procedure, called ‘partial pivoting’, is recommended for reducing the round-off 
error even in the case where the kth pivot af k -1) is not zero. 

Let us consider the following example: 


'o i r 

Xi 


'fci=2' 

2 -1 -1 

X2 

= 

b 2 = 0 

i i -i 

_ x 3 _ 


b 3 = l 


We construct the augmented matrix by combining the coefficient matrix and the 
RHS vector to write 


Oil «12 «13 b\ 

0,21 dll a 23 b 2 

031 0 3 2 «33 b 2 


0 1 1 2~|: n 
2-1-10 : r 2 
1 1 -1 lj: r 3 


(2.2.9) 


and apply the Gauss elimination procedure. 

In the stage of forward elimination, we want to do pivoting at a\\, but a\\ 
cannot be used as the pivoting element because it is zero. So we switch the first 
row and the second row having the element of the largest absolute value in the 
first column. 
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Then we do pivoting at a \ \ by applyii 


Here, instead of pivoting at a \we switch the second row and the third row 
having the element of the largest absolute value among the elements not above 
aiV in the second column. 


by applying Eq. (2.2.4)— 


(2.2.5)—to get the upper-triangularized form: 


= 0 3/2 -1/2 1 : 

|_0 0 4/3 4/3J : 

Now, in the stage of backward substitution, 
erally, Eq. (2.2.7) to get the final solution as 

* 3 = bf/alf = (4/3)/(4/3) = 1 


: apply Eq. (2.2.6), more gen- 


x 2 = (bf> - a%x 3 )/a% = (1 - (-1/2) x l)/(3/2) = 


J2 a u x n)/an =( 0-(-l)> 


Ui 


( 2 . 2 . 12 ) 
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Let us consider another system of equations. 


1 0 ll r jci"I Ybi =2 
1 11 x 2 = b 2 = 3 

1 -1 1J \_x 3 j \_b 3 = 1 


(2.2.13) 


We construct the augmented matrix by combining the coefficient matrix and the 
RHS vector to write 


011 012 013 b\ 


"1 012" 

: n 

021 022 023 b 2 

= 

1 113 

: r 2 (2.2.14) 

_0 3 1 0 32 033 b 3 _ 


1-111 

: r 3 


and apply the Gauss elimination procedure. 

First, noting that all the elements in the first column have the same absolute 
value and so we don’t need to switch the rows, we do pivoting at an- 


r o (1) a m fl (1) 

Oil “l2 “l3 



"l 0 

1 2" 

. r m 


o (1) o (1) o (1) 
a 21 a 22 a 23 


= 

0 1 

0 1 

. 

■ r 2 

(2.2.15a) 

a m a m a m 

_ w 31 “32 “33 

b y\ 


_0 -1 

0 - 1 _ 

. _(i) 

• r 3 



Second, without having to switch the rows, we perform pivoting at a ^ 2 . 


r (!) 

' 1 

-> 

r< 


o (2) 

“13 

^ 2) 1 


"l 0 

1 2 " 

r (l) 

r 2 

-> 

J2) 

“21 

“22 

fl (2) 

“23 


= 

0 1 

0 1 

rf-o^xr® 

-> 

J2) 

_“31 

o (2> 

fl (2) 

“33 

b? 


_0 0 0 0_ 


(2.2.15b) 

Now, we are at the stage of backward substitution, but a®, which is supposed 
to be the denominator in Eq. (2.2.7), is zero. We may face such a weird situation 
of zero division even during the forward elimination process where the pivot is 
zero; besides, we cannot find any (nonzero) element below it in the same column 
and on its right in the same row except the RHS element. In this case, we 
cannot go further. This implies that some or all rows of coefficient matrix A are 
dependent on others, corresponding to the case of redundancy (infinitely many 
solutions) or inconsistency (no exact solution). Noting that the RHS element 
of the zero row in Eq. (2.2.15.2) is also zero, we should declare the case of 
redundancy and may have to be satisfied with one of the infinitely many solutions 
being the RHS vector as 


[*1 x 2 x 3 ] = [bf bf = [2 1 0] (2.2.16) 


Furthermore, if we remove the all-zero row(s), the problem can be treated as an 
underdetermined case handled in Section 2.1.2. Note that, if the RHS element 
were not zero, we would have to declare the case of inconsistency, as will be 
illustrated. 

Suppose that b\ = 1 in Eq. (2.2.14). Then, the Gauss elimination would have 
proceeded as follows: 
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1 011 

1 113 

1-111 


1 011 

0 10 2 

0-100 


10 11 
0 10 2 
0 0 0 2 


(2.2.17) 


This ended up with an all-zero row except the nonzero RHS element, corre¬ 
sponding to the case of inconsistency. So we must declare the case of ‘no exact 
solution’ for this problem. 

The following MATLAB routine “gauss()” implements the Gauss elimination 
algorithm, and the program “do_gauss” is designed to solve Eq. (2.2.8) by using 
“gauss ()”. Note that at every pivoting operation in the routine “gauss ()”, the 
pivot row is divided by the pivot element so that every diagonal element becomes 
one and that we don’t need to perform any computation for the /cth column at 
the &th stage, since the column is supposed to be all zeros but the Mi element 
a® = 1. 


function x = gauss(A,B) 

%The sizes of matrices A,B are supposed to be NA x NA and NA x NB. 
%This function solves Ax = B by Gauss elimination algorithm. 

NA = size(A,2); [NB1,NB] = size(B); 

if NB1 -= NA, error('A and B must have compatible dimensions'); end 
N = NA + NB; AB = [A(1:NA,1:NA) B(1:NA,1:NB)]; % Augmented matrix 
epss = eps*ones(NA,1); 

%Scaled Partial Pivoting at AB(k,k) by Eq.(2.2.20) 

[akx,kx] = max(abs(AB(k:NA,k))./ ... 

max(abs([AB(k:NA,k + 1:NA) epss(1:NA - k + 1)]'))'); 
if akx < eps, error( 1 Singular matrix and No unique solution 1 ); end 
mx = k + kx - 1; 

if kx > 1 % Row change if necessary 
tmp_row = AB(k,k:N); 

AB(k,k:N) = AB(mx,k:N); 

AB(mx,k:N) = tmp_row; 

% Gauss forward elimination 

AB (k,k + 1:N) = AB(k,k+1:N)/AB(k,k); 

AB(k,k) = 1; %make each diagonal element one 
for m = k + 1: NA 

AB(m,k+1:N) =AB(m,k+1:N) - AB(m,k)*AB(k,k+1:N); %Eq.(2.2.5) 

AB (m,k) = 0; 
end 
end 

%backward substitution for a upper-triangular matrix eqation 
% having all the diagonal elements equal to one 
x(NA,:) = AB(NA,NA+1:N); 
for m = NA-1: -1:1 

x(m,:) = AB(m,NA + 1:N)-AB(m,m + 1:NA)*x(m + 1:NA,:); %Eq.(2.2.7) 
end 


%do_gauss 

A = [0 1 1 ;2 -1 -1; 1 1 -1]; b = [2 0 1]'; %Eq.(2.2.8) 
x = gauss(A,b) 

xl = A\b %for comparison with the result of backslash operation 
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(cf) The number of floating-point multiplications required in this routine ‘gaussO’ is 


^{(AA -k + 1) (NA +NB -k) + NA-k+\}+ ^ (NA - k)NB 

= J2 k (k + NB-])-NBj2k + J2 NA ' N B 

= + \)NA(2NA + 1) - i NA(NA + 1) + NA 2 NB 

= i NA(NA + 1 )(NA - 1) + NA 2 NB 

^ ^iVA 3 for NA >- NB (2.2.18) 

where NA is the size of the matrix A, and NB is the column dimension of the RHS 
matrix B. 


Here are several things to note. 

Remark 2.2. Partial Pivoting and Undetermined/Inconsistent Case 

1. In Gauss or Gauss-Jordan elimination, some row switching is performed 
to avoid the zero division. Even without that purpose, it may be helpful 
for reducing the round-off error to fix 

Max{|a mJ t|, k<m<M} (2.2.19) 


as the pivot element in the kth iteration through some row switching, which 
is called ‘partial pivoting.’ Actually, it might be better off to fix 


Max 


\Omk\ 

Max{|a m „|, k < n < M}’ 


( 2 . 2 . 20 ) 


as the pivot element in the k\h iteration, which is called ‘scaled partial 
pivoting’ or to do column switching as well as row switching for choosing 
the best (largest) pivot element, which is called ‘full pivoting.’ Note that 
if the columns are switched, the order of the unknown variables should be 
interchanged accordingly. 

2. What if some diagonal element a ** and all the elements below it in the 
same column are zero and, besides, all the elements in the row including 
cikk are also zero except the RHS element? It implies that some or all 
rows of the coefficient matrix A are dependent on others, corresponding 
to the case of redundancy (infinitely many solutions) or inconsistency (no 
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exact solution). If even the RHS element is zero, it should be declared 
to be the case of redundancy. In this case, we can get rid of the all-zero 
row(s) and then treat the problem as the underdetermined case handled in 
Section 2.1.2. If the RHS element is only one nonzero in the row, it should 
be declared to be the case of inconsistency. 


Example 2.2. Delicacy of Partial Pivoting. To get an actual feeling about the 
delicacy of partial pivoting, consider the following systems of linear equations, 
which apparently have x° = [1 l] r as their solutions. 

(a) Aix = bi with Al = [ 10 j 15 1( J n ] , b, = [ \ J?J*] (E2.2.1) 


Without any row switching, the Gauss elimination procedure will find us 
the true solution only if there is no quantization error. 


[Ai bj] = 


10“ 15 


1 1 + 10“ 15 ] 
10 n 10 11 + 1 J 


' 1 10 15 
0 to 11 - 10 15 


10 15 +1 ] 
to 11 - to 15 J 


[i] 


But, because of the round-off error, it will deviate from the true solution. 


forward 

elimination 


' 1 10 15 = 9.999999999999999e+014 10 15 + 1 = 1.00000000000000 le+015 
0 10 11 - 10 15 10 11 + 1 - ( 10 15 - 1 ) 

= —9.998999999999999e+014 = -9.999000000000000e+014 


8.750000000000000e-001 1 

1.000000000000000e+000 J 


If we enforce the strategy of partial pivoting or scaled partial pivoting, the 
Gauss elimination procedure will give us much better result as follows: 


[A! bi] 


10 11 


10 11 + 1 1 

1 + 10“ 15 J 


10 n = 1.000e+011 
1 - 10“ 4 = 9.999e-001 


10 11 + 1 = 1.000000000010000e+011 
9.999000000000001e-001 


9.999847412109375e-001 ] 
1.000000000000000e+000J 


(b) A 2 X = b 2 


(E2.2.2) 
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Without partial pivoting, the Gauss elimination procedure will give us a 
quite good result. 


' 1 to 14 - 6 = 3.981071705534969e+014 IO 14 - 6 + 1 = 3.981071705534979e+014' 
0 6.018928294465030e+014 6.018928294465030e+014 

[1 3.981071705534969e+014 3.981071705534979e+014 ] 

L« i i J 


[i] 


But, if we exchange the first row with the second row having the larger 
element in the first column according to the strategy of partial pivoting, the 
Gauss elimination procedure will give us a rather surprisingly bad result 
as follows: 


^forward 


' 1 10 15 = 1.000000000000000e+015 10 15 + 1 = 1.000000000000001e+015 
0 1 - 10 15 • 10“ 14 - 6 1 + 10“ 14 - 6 - (1+ 10 15 ) • 10“ 14 - 6 
= -1.5118864315095819 = -1.5118864315095821 


'0.75000000000000001 

1.0000000000000002 J 


One might be happy to have the scaled partial pivoting scheme 
[Eq. (2.2.20)], which does not switch the rows in this case, since the 
relative magnitude (dominancy) of an in the first row is greater than that 
of U 2 i in the second row, that is, 10 -14 6 /1 > 1/10 15 . 

(c) a 3 x = b 3 with a 3 = H | 1 Q J I4 . 6 j , b 3 = [ x jq-14.6 ] (E2.2.3) 


With any pivoting scheme, we don’t need to switch the rows, since the 
relative magnitude as well as the absolute magnitude of an in the first row 
is greater than those of <221 in the second row. Thus, the Gauss elimination 
procedure will go as follows: 

forward 

elimination H 1.000000000000000e-015 1.000000000000001e+0001 

* |_0 1-511886431509582e-015 1.332267629550188e-015 J 

substation T 1.000000000000000] 

*" X “ |_ 0 - 811955724875121 J 


(cf) Note that the coefficient matrix, A3 is the same as would be obtained by applying 
the full pivoting scheme for A2 to have the largest pivot element. This example 
implies that the Gauss elimination with full pivoting scheme may produce a worse 
result than would be obtained with scaled partial pivoting scheme. As a matter of 
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factor, we cannot say that some pivoting scheme always yields better solution than 
other pivoting schemes, because the result depends on the random round-off error as 
well as the pivoting scheme (see Problem 2.2). But, in most cases, the scaled partial 
pivoting shows a reasonably good performance and that is why we adopt it in our 
routine “gauss ()”. 

Remark 2.3. Computing Error, Singularity, and Ill-Condition 

1. As the size of the matrix grows, the round-off errors are apt to accumu¬ 
late and propagated in matrix operations to such a degree that zero may 
appear to be an absolutely small number, or a nonzero number very close 
to zero may appear to be zero. Therefore, it is not so simple a task to 
determine whether a zero or a number very close to zero is a real zero or 
not. 

2. It is desirable, but not so easy, for us to discern the case of singularity 
from the case of ill-condition and to distinguish the case of redundancy 
from the case of inconsistency. In order to be able to give such a qual¬ 
itative judgment in the right way based on some quantitative analysis, 
we should be equipped with theoretical knowledge as well as practical 
experience. 

3. There are several criteria by which we judge the degree of ill-condition, 
such as how discrepant AA 1 is with the identity matrix, how far 
det{A}det{A -1 } stays away from one(l), and so on: 

AA -1 = /, [A -1 ]- 1 = A, det(A)det(A _1 ) = 1 (2.2.21) 

The MATLAB command cond () tells us the degree of ill-condition for a 
given matrix by the size of the condition number, which is defined as 

cond(A) = ||A||||A '| with ||A|| = largest eigenvalue of A r A, 
i.e., largest singular value of A 

Example 2.3. The Hilbert matrix defined by 

A = \a mn \ = [ m + l n _ J (E2.3) 

is notorious for its ill-condition. 

We increase the dimension of the Hilbert matrix from N = 7 to 12 and make 
use of the MATLAB commands cond() and det() to compute the condition 
number and det(A)det(A _1 ) in the MATLAB program “do_condition”. Espe¬ 
cially for N = 10, we will see the degree of discrepancy between A A -1 and 
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the identity matrix. Note that the number RCOND following the warning message 
about near-singularity or ill-condition given by MATLAB is a reciprocal condi¬ 
tion number, which can be computed by the ncond () command and is supposed 
to get close to 1/0 for a well-/badly conditioned matrix. 


%do_condition.m 

for m = 1:6 
for n = 1:6 

A(m,n) = 1/(m+n-1); %A = hilb(6), Eq.(E2.3) 


for N a 7:12 

for m = 1:N, A(m,N) = 1/(m + N - 1); end 

for n = 1:N - 1, A(N,n) = 1/(N + n - 1); end 
c = cond(A); d = det(A)*det(A A - 1); 

fprintf('N = %2d: cond(A) = %e, det(A)det(A~ - 1) = %8.6f\n', N, c, d); 
if N == 10, AAI = A*A A - 1, end 


N = 7: cond(A) = 4.753674e+008, 
N = 8: cond(A) = 1.525758e+010, 
N = 9: cond(A) = 4.931532e+011, 
N = 10: cond(A) = 1.602534e+013, 


det(A)det(A~-1) 
det(A)det(A~-1) 
det(A)det(A~-1) 
det(A)det(A~-1) 


1.000000 
1.000000 
1.000001 
0.999981 



0.0002 -0.0005 
0.0002 -0.0004 
0.0002 -0.0004 
0.0001 -0.0003 
1.0001 -0.0003 
0.0001 0.9998 
0.0001 -0.0002 
0.0001 -0.0002 
0.0001 -0.0001 
0.0001 -0.0002 


0.0010 -0.0010 
0.0007 -0.0007 
0.0006 -0.0006 
0.0005 -0.0006 
0.0005 -0.0005 
0.0004 -0.0004 
1.0003 -0.0004 
0.0003 0.9997 

0.0003 -0.0003 
0.0003 -0.0003 


0.0004 -0.0001 
0.0003 -0.0001 
0.0003 -0.0000 
0.0003 -0.0000 
0.0002 -0.0000 
0.0002 -0.0000 
0.0002 -0.0000 
0.0002 -0.0000 
1.0001 -0.0000 
0.0001 1.0000 


N. = 11: cond(A) =5.218389e+014, det(A)det(A~-1 ) = 1.000119 
Warning: Matrix is close to singular or badly scaled. 

Results may be inaccurate. RCOND = 3.659249e-017. 
> In C:\MATLAB\nma\do_condition.m at line 12 
N = 12: cond(A) =1.768065e+016, det(A)det(A~-1 ) = 1.015201 


2.2.3 Gauss-Jordan Elimination 

While Gauss elimination consists of forward elimination and backward sub¬ 
stitution as explained in Section 2.2.1, Gauss-Jordan elimination consists of 
forward/backward elimination, which makes the coefficient matrix A an identity 
matrix so that the resulting RHS vector will appear as the solution. 
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For simplicity, we start from the triangular matrix-vector equation (2.2.3) 
obtained by applying the forward elimination: 


0 0 a \3 b 3 


First, we divide the last row by a 33 


0 0 ajj = 1 = b^/ag 


and subtract (the third row xa„ l3 (m = 1,2)) from the above two rows to get 


fl n a n aS 3 =° b \= b i~ a n b 3 
0 aP a 1 ' 1 = 0 M 1] = bP - aPbP 


Now, we divide the second row by a 22 ; '■ 


0 ag = 1 0 b?=bP/a£ 


and subtract (the second row xa^ 2 \m = 1)) from the above first row to get 


0 1 0 


Lastly, we divide the first row by aj 3 to get 


0 0 by 1 = b\ Zi /aft 
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which denotes a system of linear equations having an 
coefficient matrix 


Ix = h a = [b p b m b m f 


identity matrix as the 


and, consequently, take the RHS vector b 11 as the final solution. 

Note that we don’t have to distinguish the two steps, the forward/backward 
elimination. In other words, during the forward elimination, we do the pivot¬ 
ing operations in such a way that the pivot becomes one and other elements 
above/below the pivot in the same column become zeros. 

Consider the following system of linear equations: 



(2.2.28) 


We construct the augmented matrix by combining the coefficient matrix and the 
RHS vector to write 


an «i2 013 b\ 

021 022 023 b 2 

031 032 033 h 


-1 

1 

1 


-2 2 —ll : n 

1- 1 1 : r 2 

2- 1 2] : r 3 


(2.2.29) 


and apply the Gauss-Jordan elimination procedure. 

First, we divide the first row r x by a n = — 1 to make the new first row r® 
have the pivot a® = 1 and subtract a m \ x r®(m = 2, 3) from the second and 
third row r 2 and r 3 to get 


r\ -T- (—1) —>■ 

'«iT 

a 12 

a 13 } 

b p- 


' 1 2-2 1' 

. _(1) 

• '\ 

r 2 — 1 x r® —> 

a® 

a 22 

0® 

bP 

= 

0-1 10 

. r (l) 

rs-lxrfU 

_aP 

«32 

4$ 

bP_ 


_0 0 1 1_ 

: rp 


(2.2.30a) 

Then, we divide the second row r ® by alp = — 1 to make the new second row 
r® have the pivot a® = 1 and subtract ap x rp (m = 1,3) from the first and 
third row rp and r ( p to get 


(1) n. (2) 

r 1 - z x r 2 

'a® a® 

«i ( ? 

bP~ 


1 

OOl’ 

. r (2) 

ff-F(-l) - 

a® a® 

a 23 

b? 

= 

0 

1 -1 0 

. _(2) 

• r 2 

rP — Ox rp —> 

a® 0® 

aP 

b?\ 


0 

0 1 1 

. r (2) 

• r 3 


(2.2.30b) 

Lastly, we divide the third row rp by a® = 1 to make the new third row 
r® have the pivot a® = 1 and subtract apl x rp (m = 1,2) from the first and 
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second row r® and r® to get 


jf-Oxrf -* 

'fl® 

a® 

*® 

A® 1 

b \ 


'10 0 

i=%r 

. _(3) 

• '1 

r»-,<-l)xr®-> 

a® 

a® 

a® 

bf 

= 

0 1 0 

1 = *2 

. _(3) 

• r 2 

„( 2 ) 

'3 




bf\ 


_0 0 1 

1 = *3 _ 

. _(3) 

• '3 


(2.2.30c) 

After having the identity matrix-vector form like this, we take the RHS vector 
as the solution. 

The general formula applicable for Gauss-Jordan elimination is the same as 
Eq. (2.2.5), except that the index set is m ^ k —that is, all the numbers from 
m = 1 to m = M except m = k. Interested readers are recommended to make 
their own routines to implement this algorithm (see Problem 2.3). 


2.3 INVERSE MATRIX 

In the previous section, we looked over some algorithms to solve a system of 
linear equations. We can use such algorithms to solve several systems of linear 
equations having the same coefficient matrix 

Axi = bi, Ax 2 — b 2 ,..., Ax nb = b NB 

by putting different RHS vectors into one RHS matrix as 

A[xi x 2 ■ • • xjvb ] — [ bi b 2 ■ ■ ■ bA® ], AX = B 


If we substitute an identity matrix I for B into this equation, we will get the matrix 
inverse X = A~ l I = A -1 . We, however, usually use the MATLAB command 
inv(A) or A" -1 to compute the inverse of a matrix A. 


2.4 DECOMPOSITION (FACTORIZATION) 

2.4.1 LU Decomposition (Factorization): Triangularization 

LU decomposition (factorization) of a nonsingular (square) matrix A means 
expressing the matrix as the multiplication of a lower triangular matrix L and 
an upper triangular matrix U, where a lower/upper triangular matrix is a matrix 
having no nonzero elements above/below the diagonal. For the case where some 
row switching operation is needed like in the Gauss elimination, we include a 
permutation matrix P representing the necessary row switching operation(s) to 
write the LU decomposition as 


P A = LU 


( 2 . 4 . 1 ) 
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The usage of a permutation matrix is exemplified by 



'0 

0 

r 

an 

an 

013" 


«31 

032 

033 " 

PA = 

1 

0 

0 

a 2 \ 

a 22 

«23 

= 

a u 

012 

013 


0 

1 

0 

_«31 

«32 «33 _ 


_ 021 

022 

023 _ 


which denotes switching the first and third rows followed by switching the second 
and third rows. An interesting and useful property of the permutation matrix is 
that its transpose agrees with its inverse. 

P T P = 1, P T = p- 1 (2.4.3) 

To take a close look at the LU decomposition, we consider a 3 x 3 nonsingular 
matrix: 


011 

012 

013 " 


1 

0 0" 

‘Mil M12 

«13 


021 

022 

023 

= 

/21 

1 0 

0 U22 

M23 


031 

032 

033 _ 


h 1 

I32 1 _ 

_ 0 0 

M33 _ 


011 

012 

013 " 


Mil 


U\2 


M 13 

021 

022 

023 

= 

h\u 

11 h\u 

12 + «22 

hi 

M13 + U23 

031 

032 

033 _ 


_h\u 

11 h\Uu 

: + I32U22 1 

•31013 + I32U23 + U33 


First, equating the first rows of both sides yields 

u ln =a u , n= 1,2,3 (2.4.5a) 

Then, equating the second rows of both sides yields 

a2\=h\U\\, a 2 2 = hiun + u 2 2, 023 = hiun + W23 

from which we can get 

/21 = 021/1*11, u 22 = 021 ~ hiu \ 2 , u 2 3 = a 2 3 - h\U\3 (2.4.5b) 

Now, equating the third rows of both sides yields 

«31 = ^ 31 ^ 11 , a 32 = h\U\2 + I32U22, «33 = I31U13 + U32U23 + U33 

from which we can get 

hi=a3i/u u , I32 = (032 - h\Un)/u22, m 33 = (a 33 - Z31M13) - /32M23 

(2.4.5c) 

In order to put these formulas in one framework to generalize them for matri¬ 
ces having dimension greater than 3, we split this procedure into two steps 
and write the intermediate lower/upper triangular matrices into one matrix for 
compactness as 
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Oil a\2 dl3 

step 1: a 2 i a 22 a 2 3 

_a 3 i a 32 «33 _ 


step 2: 


Mu = an 
/21 = an/Mn 
/31 = a 3 i/Mn 
M11 a 12 Ml 3 

/21 «22 = «22 “23 = 

_ hi hi = a 3 2/“22 a® = a^ 1 - Z 32 m 23 


«i2 = ai 2 Mi 3 = ai 3 

^ = 022 — Z21M12 ai!, 1 = a 2 3 — limn 

= a 32 - Z 3 iMi2 = a 33 - Z 3 iMi 3 


(2.4.6a) 


(2.4.6b) 


This leads to an LU decomposition algorithm generalized for an NA x NA 
nonsingular matrix as described in the following box. The MATLAB routine 
“lu_dcmp()” implements this algorithm to find not only the lower/upper 
triangular matrix L and U, but also the permutation matrix P. We run it for 
a 3 x 3 matrix to get L, U, and P and then reconstruct the matrix P~ X LU = A 
from L, U, and P to ascertain whether the result is right. 


function [L,U,P] = lu_dcmp(A) 

%This gives LU decomposition of A with the permutation matrix P 
% denoting the row switch(exchange) during factorization 
NA = size(A,1); 

AP = [A eye(NA)]; %augment with the permutation matrix, 
for k = 1:NA - 1 
%Partial Pivoting at AP(k,k) 

[akx, kx] = max(abs(AP(k:NA,k))); 
if akx < eps 

error('Singular matrix and No LU decomposition 1 ) 

if kx > 1 % Row change if necessary 
tmp_row = AP(k,:); 

AP(k,:) = AP(mx,:); 

AP(mx,:) = tmp_row; 
end 

% LU decomposition 
for m = k + 1: NA 

AP(m,k) = AP(m,k)/AP(k,k); %Eq.(2.4.8.2) 

AP(m,k+1:NA) = AP(m,k + 1:NA)-AP(m,k)*AP(k,k + 1:NA); %Eq.(2.4.9) 
end 
end 

P = AP(1:NA, NA + 1:NA + NA); %Permutation matrix 
for m = 1: NA 
for n = 1: NA 

if m == n, L(m,m) = 1.; U(m,m) = AP(m,m); 
elseif m > n, L(m,n) = AP(m,n); U(m,n) = 0.; 
else L(m,n) = 0.; U(m,n) = AP(m,n); 

end 

end 

if nargout == 0, disp('L*U = P*A with'); L,U,P, end 
%You can check if P'*L*U = A? 
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(cf) The number of floating-point multiplications required in this routine lu_dcmp() is 

AM-1 AM-1 

(NA - k)(NA -k+ 1) = [NA(NA + 1) - (2NA + 1 )k + k 2 } 

= (NA - l)NA(NA + 1) - i( 2NA + 1)(NA - l)NA + i(AM- \)NA(2NA - 1) 

= i (NA — 1 )NA(NA + 1) « i NA 3 (2.4.7) 

with NA: the size of matrix A 


0. Initialize A (0> = A, or equivalently, a® = a mn for m,n= 1 : NA. 

1. Let k = 1. 

2. If a ( kk ~ 1 1 = 0, do an appropriate row switching operation so that 
a kk~ V) ¥= o. 

When it is not possible, then declare the case of singularity and stop. 

3. a k J = a k ~ 1 ( = Ukn for n = k : NA (Just leave the &th row as it is.) 

(2.4.8a) 

a mk = a mk ]> / a kk~ l> = l mk for m = k+\ : NA (2.4.8b) 

4. a® = ag-V - afftaj® for m, n = k + 1 : NA (2.4.9) 

5. Increment k by 1 and if k < NA — 1, go to step 1; otherwise, go to step 6. 

6. Set the part of the matrix A iNA " 11 below the diagonal to L (lower tri¬ 

angular matrix with the diagonal of l’s) and the part on and above the 
diagonal to U (upper triangular matrix). 


»A =[12 5;0.2 1.6 7.4; 0.5 4 8.5]; 

»[L,U,P] - lu_dcmp(A) %LU decomposition 

L = 1 .0 0 0 U = 1 2 5 P=1 0 0 

0.5 1.0 0 0 3 6 0 0 1 

0.2 0.4 1.0 0 0 4 0 1 0 

»P'*L*U - A %check the validity of the result (P' = P'-1) 
ans =0 0 0 

0 0 0 

0 0 0 

»[L,U,P] = lu(A) %for comparison with the MATLAB built-in 


function 


What is the LU decomposition for? It can be used for solving a system of 
linear equations as 

Ax = b (2.4.10) 

Once we have the LU decomposition of the coefficient matrix A = P T LU, it is 
more efficient to use the lower/upper triangular matrices for solving Eq. (2.4.10) 
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than to apply the Gauss elimination method. The procedure is as follows: 

P T LU x = b, LUx=P b, Ux = L~ l P b, x = U~ 1 L~ 1 P b 

(2.4.11) 

Note that the premultiplication of L 1 and U~ l by a vector can be per¬ 
formed by the forward and backward substitution, respectively. The following 
program “do_lu_dcmp.m” applies the LU decomposition method, the Gauss 
elimination algorithm, and the MATLAB operators ‘\’ and ‘inv’ or ‘"-1’ to 
solve Eq. (2.4.10), where A is the five-dimensional Hilbert matrix (introduced 
in Example 2.3) and b = Ax° with x° = [ 1 1 1 1 I \ T . The residual error 

11 Ax( — b|| of the solutions obtained by the four methods and the numbers of 
floating-point operations required for carrying out them are listed in Table 2.1. 
The table shows that, once the inverse matrix A -1 is available, the inverse matrix 
method requiring only N 2 multiplications/additions (N is the dimension of the 
coefficient matrix or the number of unknown variables) is the most efficient in 
computation, but the worst in accuracy. Therefore, if we need to continually 
solve the system of linear equations with the same coefficient matrix A for dif¬ 
ferent RHS vectors, it is a reasonable choice in terms of computation time and 
accuracy to save the LU decomposition of the coefficient matrix A and apply the 
forward/backward substitution process. 


%do_lu_dcmp 

% Use LU decomposition, Gauss elimination to solve Ax = b 
A = hilb(5); 

[L,U,P] = ludcmp(A); %LU decomposition 
X = [1 -23-45-67-89-10]'; 
b = A*x(1:size(A,1)); 

flops(O), x_lu = backsubst(U,forsubst(L,P*b)); %Eq.(2.4.11) 

flps(1) = flops; % assuming that we have already got L\U decomposition 

flops(O), x_gs = gauss(A,b); flps(3) = flops; 

flops(O), x_bs = A\b; flps(4) = flops; 

AI = A'-1; flops(O), x_iv = AI*b; flps(5) = flops; 

% assuming that we have already got the inverse matrix 
disp(' x_lu x_gs x_bs x_iv') 

format short e 

solutions = [x_lu x_gs x_bs x_iv] 

errs = [norm(A*x_lu - b) norm(A*x_gs - b) norm(A*x_bs - b) norm(A*x_iv - b)] 
format short, flps 
function x = forsubst(L,B) 

%forward substitution for a lower-triangular matrix equation Lx = B 
N = size(L,1); 
x (1,:) = B (1, :) / L (1 ,1); 
for m = 2:N 

x(m,:) = (B(m,:)-L(m,1:m - 1)*x(1:m-1,:))/L(m,m); 
end 


function x = backsubst(U,B) 

%backward substitution for a upper-triangular matrix equation Ux = B 
N = size(U,2); 
x(N,:) = B(N,:)/U(N,N); 
for m = N-1: -1:1 

x(m,:) = (B(m,:) - U(m,m + 1:N)*x(m + 1:N,:))/U(m,m); 
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Table 2.1 Residual Error and the Number of Floating-Point Operations of Various 
Solutions 



tmp = forsubst(L,P*b) 
backsubst(U,tmp) 

gauss(A,b) 

A\b 

A~-l*b 

11 Ax,- — b|| 

1.3597e-016 

5.5511e-017 

1.7554e-016 

3.0935e-012 

# of flops 

123 

224 

155 

50 


(cf) The numbers of flops for the LU decomposition and the inverse of the matrix A are not counted, 
(cf) Note that the command ‘flops’ to count the number of floating-point operations is no longer 
available in MATLAB 6.x and higher versions. 


2.4.2 Other Decomposition (Factorization): Cholesky, QR, and SVD 

There are several other matrix decompositions such as Cholesky decomposition, 
QR decomposition, and singular value decomposition (SVD). Instead of looking 
into the details of these algorithms, we will simply survey the MATLAB built-in 
functions implementing these decompositions. 

Cholesky decomposition factors a positive definite symmetric/Hermitian matrix 
into an upper triangular matrix premultiplied by its transpose as 

A = U T U (U : an upper triangular matrix) (2.4.12) 

and is implemented by the MATLAB built-in function chol(). 

(cf) If a (complex-valued) matrix A satisfies A* T = A —that is, the conjugate transpose 
of a matrix equals itself—it is said to be Hermitian. It is said to be just symmetric 
in the case of a real-valued matrix with A r = A. 

(cf) If a square matrix A satisfies x* T A x > 0 V x A 0, the matrix is said to be positive 
definite (see Appendix B). 

»A =[2 3 4;3 5 6;4 6 9]; %a positive definite symmetric matrix 
»U = chol(A) %Cholesky decomposition 
U = 1.4142 2.1213 2.8284 

0 0.7071 0.0000 

0 0 1.0000 

»U'*U - A %to check if the result is right 

QR decomposition is to express a square or rectangular matrix as the product 
of an orthogonal (unitary) matrix Q and an upper triangular matrix R as 

A = QR (2.4.13) 

where Q T Q = I (Q* T Q = I). This is implemented by the MATLAB built-in 
function qr(). 
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(cf) If all the columns of a (complex-valued) matrix A are orthonormal to each other—that 
is, A* T A = /, or, equivalently, A* T = A ~ l —it is said to be unitary. It is said to be 
orthogonal in the case of real-valued matrix with A T = A~ l . 


SVD (singular value decomposition) is to express anMxiV matrix A in the 
following form 


A = USV T 


(2.4.14) 


where U is an orthogonal (unitary) M x M matrix, V is an orthogonal (uni¬ 
tary) N x N matrix, and S is a real diagonal M x N matrix having the sin¬ 
gular values of A (the square roots of the eigenvalues of A T A) in decreasing 
order on its diagonal. This is implemented by the MATLAB built-in function 
svd(). 

»A = [1 2;2 3;3 5]; %a rectangular matrix 
»[U,S,V] = svd(A) %Singular Value Decomposition 

U = 0.3092 0.7557 -0.5774 S = 7.2071 0 V = 0.5184 -0.8552 

0.4998 -0.6456 -0.5774 0 0.2403 0.8552 0.5184 

0.8090 0.1100 0.5774 0 0 

»err = U*S*V'-A %to check if the result is right 
err = l.0e-015* -0.2220 -0.2220 

0 0 

0.4441 0 


2.5 ITERATIVE METHODS TO SOLVE EQUATIONS 

2.5.1 Jacobi Iteration 

Let us consider the equation 

3x + 1 = 0 

which can be cast into an iterative scheme as 

„ , JC + 1 1 1 

2x = -x - l;x = ---» x k+i = -~x k - - 

Starting from some initial value jco for k = 0, we can incrementally change k 
by 1 each time to proceed as follows: 

X\ = —2 -1 - 2~ l x 0 

x 2 = -2- 1 - 2~ l xi = -2- 1 + 2- 2 + 2~ 2 xq 
x 3 = -2- 1 - 2~ l Xi = -2- 1 + 2“ 2 - 2~ 3 - 2~ 3 xq 


Whatever the initial value x 0 is, this process will converge to the sum of a 
geometric series with the ratio of (— 1 / 2 ) as 
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T ap -1/2 _ 1 _ 0 

Xk i-r 1 — (—1/2) 3 * 


as k -> oo 


and what is better, the limit is the very true solution to the given equation. 
We are happy with this, but might feel uneasy, because we are afraid that this 
convergence to the true solution is just a coincidence. Will it always converge, 
no matter how we modify the equation so that only x remains on the LHS? 

To answer this question, let us try another iterative scheme. 


x k + i = ~2x k - 1 


X2 — — 1 — 2xi — — 1 — 2 (— 1 — 2xq) — — 1 + 2 + 2?xq 

x 3 = —1 - 2x 2 = —1 + 2 — 2 2 — 2 3 ^o 


This iteration will diverge regardless of the initial value x$. But, we are never 
disappointed, since we know that no one can be always lucky. 

To understand the essential difference between these two cases, we should 
know the fixed-point theorem (Section 4.1). Apart from this, let’s go into a system 
of equations. 


'3 2 ] r*ii r r 

1 2 J \_X2 \ — — 1 


Dividing the first equation by 3 and transposing all term(s) other than x\ to the 
RHS and dividing the second equation by 2 and transposing all term(s) other 
than X 2 to the RHS, we have 

xjt+i = A x k + b (2.5.1) 


Assuming that this scheme works well, we set the initial value to zero (x 0 = 0) 
and proceed as 




[1 + A + A 2 + ---]b = [7 — A]~ l b= 2 f] [-yl] 

(2.5.2) 


which will converge to the true solution x° = [1 — 1] T . This suggests another 
method of solving a system of equations, which is called Jacobi iteration. It can 
be generalized for an N x N matrix-vector equation as follows: 
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a ml x l + a m2 x 2 + ‘ ‘ ‘ + Ctmm x m + ‘ ‘ ‘ + ClmN x N — b m 


x% +1) = - V —x® + — form = 1,2,..., N 
+ a mm " a mm 



0 

—a\ 2 /a\\ 

■ — fluv/fln 


*iMi 

= 

—a 2 i/a 2 2 

0 

• — «2Jv/ a 22 

, b = 

b 2 / a 22 


\_—aNi/aNN —am/aNN •• 

0 


_bN/aNN_ 


This scheme is implemented by the following MATLAB routine “jacobi()”. 
We run it to solve the above equation. 


function X = jacobi(A,B,XO,kmax) 

%This function finds a soltuion to Ax = B by Jacobi iteration, 
if nargin < 4, tol = 1e-6; kmax = 100; %called by jacobi(A,B,X0) 
elseif kmax < 1, tol = max(kmax,1e-16); kmax = 100; %jacobi(A,B,X0,tol) 
else tol = 1e-6; %jacobi(A,B,X0,kmax) 

if nargin < 3, X0 = zeros(size(B)); end 
NA = size(A,1); 

X = X0; At = zeros(NA,NA); 
for m = 1: NA 
for n = 1:NA 

if n -= m, At(m,n) = -A(m.n)/A(m,m); end 
end 

Bt(m,:) = B(m,:)/A(m,m); 
end 

for k = 1: kmax 

X = At*X + Bt; %Eq. (2.5.3) 

if nargout == 0, X, end %To see the intermediate results 
if norm(X - XO)/(norm(XO) + eps) < tol, break; end 
XO = X; 


»A = [3 2; 1 2]; b = [1 -1]'; %the coefficient matrix and RHS vector 

»x0 = [0 0] 1 ; %the initial value 

»x = jacobi(A,b,x0,20) %to repeat 20 iterations starting from xO 
x = 1.0000 
-1.0000 

»jacobi(A,b,x0,20) %omit output argument to see intermediate results 

X = 0.3333 0.6667 0.7778 0.8889 0.9259 . 

-0.5000 -0.6667 -0.8333 -0.8889 -0.9444 . 


2.5.2 Gauss-Seidel Iteration 

Let us take a close look at Eq. (2.5.1). Each iteration of Jacobi method updates 
the whole set of N variables at a time. However, so long as we do not use a 
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multiprocessor computer capable of parallel processing, each one of N variables 
is updated sequentially one by one. Therefore, it is no wonder that we could 
speed up the convergence by using all the most recent values of variables for 
updating each variable even in the same iteration as follows: 

2 1 

x l,k +1 = ~2 Xl ’ k + 3 

1 1 

x 2.k+\ = -~X\,k+\ - 2 


This scheme is called Gauss-Seidel iteration, which can be generalized for an 
N x N matrix-vector equation as follows: 


x (i+1) = 


bm-YTnZl 



Cl,,,,,, 


for m = 1,..., IV and for each time stage k 


(2.5.4) 


This is implemented in the following MATLAB routine “gauseidO”, which 
we will use to solve the above equation. 


function X = gauseid(A,B,XO,kmax) 

%This function finds x = A“-1 B by Gauss-Seidel iteration, 
if nargin < 4, tol = 1e-6; kmax = 100; 

elseif kmax < 1, tol = max(kmax,1e-16); kmax = 1000; 
else tol = 1e-6; 

end if nargin < 4, tol = 1e-6; kmax = 100; end 
if nargin < 3, X0 = zeros(size(B)); end 
NA = size(A,1); X = X0; 
for k = 1: kmax 

X( 1, :) = (B(1,:)-A(1,2:NA)*X(2:NA, :)) /A(1 ,1); 
for m = 2:NA-1 

tmp = B(m,:)-A(m,1:m-1)*X(1:m - 1,:)-A(m,m + 1:NA)*X(m + 1:NA,:); 
X(m,:) = tmp/A(m,m); %Eq.(2.5.4) 

X(NA,:) = (B(NA,:)-A(NA,1:NA - 1)*X(1:NA - 1,:))/A(NA,NA); 
if nargout == 0, X, end %To see the intermediate results 
if norm(X - X0)/(norm(XO) + eps)<tol, break; end 
XO = X; 


»A = [3 2; 1 2]; b = [1 -1]'; %the coefficient matrix and RHS vector 

»x0 = [0 0] 1 ; %the initial value 

»gauseid(A,b,x0,10) %omit output argument to see intermediate results 

X = 0.3333 0.7778 0.9259 0.9753 0.9918 . 

-0.6667 -0.8889 -0.9630 -0.9877 -0.9959 . 

As with the Jacobi iteration in the previous section, we can see this Gauss-Seidel 
iteration converging to the true solution x° = [1 — \\ T and that with fewer iter¬ 
ations. But, if we use a multiprocessor computer capable of parallel processing, 
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the Jacobi iteration may be better in speed even with more iterations, since it can 
exploit the advantage of simultaneous parallel computation. 

Note that the Jacobi/Gauss-Seidel iterative scheme seems unattractive and 
even unreasonable if we are given a standard form of linear equations as 

Ax = b 

because the computational overhead for converting it into the form of Eq. (2.5.3) 
may be excessive. But, it is not always the case, especially when the equations 
are given in the form of Eq. (2.5.3)/(2.5.4). In such a case, we simply repeat 
the iterations without having to use such ready-made routines as “jacobi()” or 
“gauseidO”. Let us see the following example. 

Example 2.4. Jacobi or Gauss-Seidel Iterative Scheme. Suppose the tempera¬ 
ture of a metal rod of length 10 m has been measured to be 0°C and 10°C at 
each end, respectively. Find the temperatures xi, % 2 , * 3 , and X4 at the four points 
equally spaced with the interval of 2 m, assuming that the temperature at each 
point is the average of the temperatures of both neighboring points. 

We can formulate this problem into a system of equations as 

Xq -(- X2 X1+X3 X2 + X4 

Xl = ^’ * 2= ^’ " 3 = ^’ 

x 4 = X3 + Xs with xo = 0 and X 5 = 10 (E2.4) 

This can easily be cast into Eq. (2.5.3) or Eq. (2.5.4) as programmed in the 
following program “nm2e04. m”: 


%nm2e04 

N = 4; %the number of unknown variables/equations 
kmax = 20; tol = 1e-6; 

At = [0 1 0 0; 1 0 1 0; 0 1 0 1; 0 0 1 0]/2; 
xO = 0; x5 = 10; %boundary values 
b = [xO/2 0 0 x5/2] 1 ; %RHS vector 

%initialize all the values to the average of boundary values 
xp=ones(N,1)*(x0 + x5)/2; 

%Jacobi iteration 
for k = 1:kmax 

x = At*xp +b; %Eq.(E2.4) 

if norm(x - xp)/(norm(xp)+eps) < tol, break; end 
xp = x; 
end 

k, X) = x 

%Gauss-Seidel iteration 

xp = ones(N,1)*(x0 + x5)/2; x = xp; %initial value 
for k = 1: kmax 

for n = 1:N, x(n) = At(n,:)*x + b(n); end %Eq.(E2.4) 
if norm(x - xp)/(norm(xp) + eps) < tol, break; end 
xp = x; 
end 

k, xg = x 
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The following example illustrates that the Jacobi iteration and the Gauss-Seidel 
iteration can also be used for solving a system of nonlinear equations, although there 
is no guarantee that it will work for every nonlinear equation. 


Example 2.5. Gauss-Seidel Iteration for Solving a Set of Nonlinear Equations. 

We are going to use the Gauss-Seidel iteration to solve a system of nonlinear 
equations as 


Xj + 10xi + 2x^ - 13 = 0 
2x\ - x\ + 5x 2 - 6 = 0 


(E2.5.1) 


In order to do so, we convert these equations into the following form, which 
suits the Gauss-Seidel scheme. 


Xi] I"(13 -x2-2x 2 2 )/10' 
X 2 (6 — 2xf+xf)/5 


(E2.5.2) 


We make the MATLAB program “nm2e05.m”, which uses the Gauss-Seidel 
iteration to solve these equations. Interested readers are recommended to run 
this program to see that this simple iteration yields the solution within the given 
tolerance of error in just six steps. How marvelous it is to solve the system of 
nonlinear equations without any special algorithm! 


(cf) Due to its remarkable capability to deal with a system of nonlinear equations, the 
Gauss-Seidel iterative method plays an important role in solving partial differential 
equations (see Chapter 9). 


%nm2e05.m 

% use Gauss-Seidel iteration to solve a set of nonlinear equations 
clear 

kmax = 100; tol = 1e-6; 
x = zeros(2,1); %initial value 
for k = 1:kmax 

xp = x; % to remember the previous solution 
x(1) = (13 - x(1) A 2 - 2*x(2) A 2)/10; % (E2.5.2) 
x(2) = (6 - x(1) A 3)/5; 

if norm(x - xp)/(norm(xp) + eps)<tol, break; end 
end 
k, x 


2.5.3 The Convergence of Jacobi and Gauss-Seidel Iterations 

Jacobi and Gauss-Seidel iterations have a very simple computational structure 
because they do not need any matrix inversion. So, it may be of practical use, if 
only the convergence is guaranteed. However, everything cannot always be fine, 
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as illustrated in Section 2.5.1. Then, what is the convergence condition? It is the 
diagonal dominancy of coefficient matrix A, which is stated as follows: 

N 

| a mm I > \ a mn I for m = 1,2,.... N (2.5.5) 

This implies that the convergence of the iterative schemes is ensured if, in 
each row of coefficient matrix A, the absolute value of the diagonal element 
is greater than the sum of the absolute values of the other elements. It should 
be noted, however, that this is a sufficient, not a necessary, condition. In other 
words, the iterative scheme may work even if the above condition is not strictly 
satisfied. 

One thing to note is the relaxation technique, which may be helpful in accel¬ 
erating the convergence of Gauss-Seidel iteration. It is a slight modification of 
Eq. (2.5.4) as 


— (1 - co)x%’ + 


bj. - E:.? °.,.,4 >w - E"_„+i 


and is called SOR (successive overrelaxation) for the relaxation factor 1 < co < 
2 and successive underrelaxation for 0 < co < 1. But regrettably, there is no 
general rule for selecting the optimal value of the relaxation factor co. 


PROBLEMS 

2.1 Recursive Least-Squares Estimation (RLSE) 

(a) Run the program ‘do_rlse.m’ (in Section 2.1.4) with another value of 
the true parameter 

xo =[12]' 

What is the parameter estimate obtained from the RLS solution? 

(b) Run the program “do_rlse” with a small matrix P like 

P = 0.01*eye(NA); 

What is the parameter estimate obtained from the RLS solution? Is it 
still close to the value of the true parameter? 

(c) Insert the statements in the following box at appropriate places in the 
MATLAB code “do_rlse. m” appeared in Section 2.1.4. Remove the 
last two statements and run it to compare the times required for using 
the RLS solution and the standard LS solution to get the parameter 
estimates on-line. 
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%nm2p01.m 


time_on = 0; time_off = 0; 


tic 


time_on = time_on + toe; 
tic 

xk_off = A\b; %standard LS solution 
time_off = time_off + toe; 


solutions = [x xk_off] 
discrepancy = norm(x - xk_off) 
times = [time_on time_off] 


2.2 Delicacy of Scaled Partial Pivoting 

As a complement to Example 2.2, we want to compare no pivoting, par¬ 
tial pivoting, scaled partial pivoting, and full pivoting in order to taste the 
delicacy of row switching strategy. To do it in a systematic way, add the 
third input argument (pivoting) to the Gauss elimination routine ‘gauss () ’ 
and modify its contents by inserting the following statements into appropri¬ 
ate places so that the new routine “gauss(A,b,pivoting)” implements the 
partial pivoting procedure optionally depending on the value of ‘pivoting’. 
You can also remove any unnecessary parts. 

- if nargin < 3, pivoting = 2; end %scaled partial pivoting by default 

- switch pivoting 

case 2, [akx,kx] = max(abs(AB(k:NA,k))./... 

max(abs([AB(k:NA,k + 1:NA) eps*ones(NA - k + 1,1)] '))'); 
otherwise, [akx,kx] = max(abs(AB(k:NA,k))); %partial pivoting 

- &pivoting > 0 %partial pivoting not to be done for pivot = 1 

(a) Use this routine with pivoting = 0/1/2, the ‘\’ operator and the 
‘inv()’ command to solve the systems of linear equations with the 
coefficient matrices and the RHS vectors shown below and fill in 
Table P2.2 with the residual error ||A,x —b, || to compare the results 
in terms of how well the solutions satisfy the equation, that is, 
||A f x —b/|| »0. 
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Table P2.2 Comparison of gauss () with Different Pivoting Methods in Terms of 
IIM-bH 




Atx = bi 

A 2 x = b 2 

A 3 x = b 3 

A 4 x = b 4 

gauss(A,b 

gauss(A,b 

gauss(A,b 

A\b 

i,0) (no pivoting) 

1.1) (partial pivoting) 

1.2) (scaled partial pivoting) 

1.25e-01 

4.44e-16 

0 

6.25e-02 



(b) Which pivoting strategy yields the worst result for problem (1) in (a)? 
Has the row swapping been done during the process of partial pivoting 
and scaled partial pivoting? If yes, did it work to our advantage? Did 
the V operator or the ‘inv()’ command give you any better result? 

(c) Which pivoting strategy yields the worst result for problem (2) in (a)? 
Has the row swapping been done during the process of partial pivoting 
and scaled partial pivoting? If yes, did it produce a positive effect for 
this case? Did the ‘\’ operator or the ‘inv()’ command give you any 
better result? 

(d) Which pivoting strategy yields the best result for problem (3) in (a)? Has 
the row swapping been done during the process of partial pivoting and 
scaled partial pivoting? If yes, did it produce a positive effect for this 
case? 

(e) The coefficient matrix A 3 is the same as would be obtained by applying 
the full pivoting scheme for A\ to have the largest pivot element. Does 
the full pivoting give better result than no pivoting or the (scaled) partial 
pivoting? 

(f) Which pivoting strategy yields the best result for problem (4) in (a)? Has 
the row swapping been done during the process of partial pivoting and 
scaled partial pivoting? If yes, did it produce a positive effect for this 
case? Did the ‘\’ operator or the ‘inv() ’ command give you any better 
result? 

2.3 Gauss-Jordan Elimination Algorithm Versus Gauss Elimination Algorithm 

Gauss-Jordan elimination algorithm mentioned in Section 2.2.3 is trimming 
the coefficient matrix A into an identity matrix and then takes the RHS 
vector/matrix as the solution, while Gauss elimination algorithm introduced 
with the corresponding routine “gauss()” in Section 2.2.1 makes the matrix 
an upper-triangular one and performs backward substitution to get the solu¬ 
tion. Since Gauss-Jordan elimination algorithm does not need backward 
substitution, it seems to be simpler than Gauss elimination algorithm. 
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Table P2.3 Comparison of Several Methods for Solving a Set of Linear Equations 



gauss(A,b) 

gaussj(A,b) 

A\b 

A“-l*b 

llAx,— ft]( 

3.1402e-016 


8.7419e-016 


# of flops 

1124 

1744 

785 

7670 


(a) Modify the routine “gauss()” into a routine “gaussj ()” which imple¬ 
ments Gauss-Jordan elimination algorithm and count the number of 
multiplications consumed by the routine, excluding those required for 
partial pivoting. Compare it with the number of multiplications consumed 
by “gauss()” [Eq. (2.2.18)]. Does it support or betray our expecta¬ 
tion that Gauss-Jordan elimination would take fewer computations than 
Gauss elimination? 

(b) Use both of the routines, the ‘\’ operator and the ‘inv()’ command or 
‘ A -1 ’ to solve the system of linear equations 

Ax = b (P2.3.1) 

where A is the 10-dimensional Hilbert matrix (see Example 2.3) and 
b = Ax° with x° = [111111111 l] r . Fill in Table P2.3 with the 
residual errors 

11Ax,-- b| | «»0 (P2.3.2) 

as a way of describing how well each solution satisfies the equation. 

(cf) The numbers of floating-point operations required for carrying out the 
computations are listed in Table P2.3 so that readers can compare the com¬ 
putational loads of different approaches. Those data were obtained by using 
the MATLAB command flops(), which is available only in MATLAB of 
version below 6.0. 

2.4 Tridiagonal System of Linear Equations 

Consider the following system of linear equations: 


flnXi +012*2 
021*1 + «22*2 + 023*3 


(P2.4.1) 


OV-l,V-2*JV-2 + a-N-\,N-\XN-\ + «W-1,N*N = i?N -1 

a NtN - 1 *W -1 + a N ' N x N = b N 


which can be written in a compact form by using a matrix-vector notation as 


Anxn x —b 


(P2.4.2) 
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Table P2.4 The Computational Load of the Methods to Solve a Tri-diagonal 
System of Equations 



gauss(A,b) 

trid(A,b) 

gauseid() 

gauseidl() A\b 

# of flops 

141 


50 

2615 

2082 94 

where 










~a n 

an 

0 

0 

o - 




a 2 i 

a 2 i 

an 

0 

0 


A f 

fxN = 

0 




0 




0 

0 

aN-i,N-i 

aN-\,N-i 

ON-l,N 




_ 0 

0 

0 

aN,N- 1 

aNN _ 






" bi - 


*2 


b 2 

X = 

*AT-1 

b = 

bN- i 


_ Xff _ 


_ b/f _ 


This is called a tridiagonal system of equations on account of that the 
coefficient matrix A has nonzero elements only on its main diagonal and 
super-/subdiagonals. 

(a) Modify the Gauss elimination routine “gauss()” (Section 2.2.1) in such 
a way that this special structure can be exploited for reducing the com¬ 
putational burden. Give the name ‘trid()’ to the modified routine and 
save it in an m-file named “trid.m” for future use. 

(b) Modify the Gauss-Seidel iteration routine “gauseid()” (Section 2.5.2) 
in such a way that this special structure can be exploited for reduc¬ 
ing the computational burden. Let the name of the modified routine be 
“Gauseidl ()”. 

(c) Noting that Eq. (E2.4) in Example 2.4 can be trimmed into a tridiago¬ 
nal structure as (P2.4.2), use the routines “gauss()’\ “trid()’\ “gau- 
seid()”, “gauseidl ()”, and the backslash (\) operator to solve the 
problem. 

(cf) The numbers of floating-point operations required for carrying out the 
computations are listed in Table P2.4 so that readers can compare the com¬ 
putational loads of the different approaches. 

2.5 LU Decomposition of a Tridiagonal Matrix 

Modify the LU decomposition routine “lu_dcmp()” (Section 2.4.1) in such a 
way that the tridiagonal structure can be exploited for reducing the 
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computational burden. Give the name “lu_trid()” to the modified routine 
and use it to get the LU decomposition of the tridiagonal matrix 


2-1 0 O' 

-1 2-1 0 

0-1 2-1 
0 0-12 


(P2.5.1) 


You may type the following statements into the MATLAB command window: 

»A =[2-10 0; -1 2 -1 0; 0 -1 2 -1; 0 0 -1 2]; 

»[ L, U] = lu_trid (A) 

»L*U - A % = 0 (No error)? 


2.6 LS Solution by Backslash Operator and QR Decomposition 

The backslash (‘A\b’) operator and the matrix left division 
(‘mldivide (A, b) ’) function turn out to be the most efficient means for solv¬ 
ing a system of linear equations as Eq. (P2.3.1). They are also capable of 
dealing with the under/over-determined cases. Let’s see how they handle the 
under/over-determined cases. 

(a) For an underdetermined system of linear equations 


AiX = bi, 



" 14" 
_ 32 _ 


(P2.6.1) 


find the minimum-norm solution (2.1.7) and the solutions that can be 
obtained by typing the following statements in the MATLAB command 
window: 


»A1 = [1 2 3; 4 5 6] ; bi = [14 32] 1 ; 

»x_mn = A1 1 * (A1 *A1 1 ) A -1 *b1, x_pi = pinv(A1)*b1, x_bs = A1\b1 

Are the three solutions the same? 

(b) For another underdetermined system of linear equations 


A 2 x = b2, 


*t 

*2 

*3 


(P2.6.2) 


find the solutions by using Eq. (2.1.7), the commands pinv(), and back¬ 
slash (\). If you are not pleased with the result obtained from Eq. (2.1.7), 
you can remove one of the two rows from the coefficient matrix A 2 and 
try again. Identify the minimum solution(s). Are the equations redundant 
or inconsistent? 
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Table P2.6.1 Comparison of Several Methods for Computing the LS Solution 



QR 

LS: Eq. (2.1.10) 

pinv(A)*b 

A\b 

II Ax,- — b|| 

2.8788e-016 


2.8788e-016 


# of flops 

25 

89 

196 

92 


(c) For another underdetermined system of linear equations 


A 2 x = b 3 , 


Xi 

X 2 

X 3 


' 21 ' 

21 


(P2.6.3) 


find the solutions by using Eq. (2.1.7), the commands pinv (), and back¬ 
slash (\). Does any of them satisfy Eq. (P2.6.3) closely? Are the equations 
redundant or inconsistent? 

(d) For an overdetermined system of linear equations 


A 4 x = b 4 , 



(P2.6.4) 


find the LS (least-squares) solution (2.1.10), that can be obtained from 
the following statements. Fill in the corresponding blanks of Table P2.6.1 
with the results. 


»A4 = [1 2; 2 3; 4 -1]; b4 = [5.2 7.8 2.2]'; 

» x_ls = (A4 1 *A4)\A4 1 *b4, x_pi = pinv(A4)*b4, x_bs = A4\b4 

(e) We can use QR decomposition to solve a system of linear equations as 
Eq. (P2.3.1), where the coefficient matrix A is square and nonsingular or 
rectangular with the row dimension greater than the column dimension. 
The procedure is explained as follows: 

Ax = QRx = b, Rx = Q ' b = g'b, x = R l Q'b (P2.6.5) 

Note that Q'Q = /; Q' = Q 1 (orthogonality) and the premultiplica¬ 
tion of R ~ 1 can be performed by backward substitution, because R is 
an upper-triangular matrix. You are supposed not to count the num¬ 
ber of floating-point operations needed for obtaining the LU and QR 
decompositions, assuming that they are available. 

(i) Apply the QR decomposition, the LU decomposition, Gauss elimi¬ 
nation, and the backslash (\) operator to solve the system of linear 
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Table P2.6.2 Comparison of Several Methods for Solving a System of Linear 
Equations 



LU 

QR 

gauss(A,b) 

A\b 

llAx; -b|| 


7.8505e-016 


8.7419e-016 

# of flops 

453 

327 

1124 

785 


equations whose coefficient matrix is the 10-dimensional Hilbert 
matrix (see Example 2.3) and fill in the corresponding blanks of 
Table P2.6.2 with the results. 

(ii) Apply the QR decomposition to solve the system of linear equations 
given by Eq. (P2.6.4) and fill in the corresponding blanks of 
Table P2.6.2 with the results. 

(cf) This problem illustrates that QR decomposition is quite useful for solving 
a system of linear equations, where the coefficient matrix A is square and 
nonsingular or rectangular with the row dimension greater than the column 
dimension and no rank deficiency. 

2.7 Cholesky Factorization of a Symmetric Positive Definite Matrix: 

If a matrix A is symmetric and positive definite, we can find its LU 
decomposition such that the upper triangular matrix U is the transpose of 
the lower triangular matrix L, which is called Cholesky factorization. 

Consider the Cholesky factorization procedure for a 4 x 4 matrix 


a n «i2 «13 «14 
an 022 023 024 

«13 O23 033 a 3 4 

0\4 Cl 24 034 CI44 


0 0 0 
M12 M22 0 0 

M13 U23 U33 0 

U 14 U24 U34 U44 


U 11 M12 M13 M14 

0 M22 M23 U24 

0 0 U33 U34 

0 0 0 #44 


M 12 M 11 

M13M11 


|_ M14M11 


M13M12 +M23M22 
M14M12 + U24U22 


M11M13 

M12M13 +M22M23 


M14M13 + M24M23 + M34M33 


M11M14 

M12M14 +M22M24 
M13M14 + M23M24 + U33U34 


Equating every row of the matrices on both sides yields 


(P2.7.1) 


M 11 = x/fln". Wi2 = «12/mii, Mi3=ai 3 /Mn, u\4 = au/un (P2.7.2.1) 

U22 = -\l0-22 - u\ 2 , U23 = (a 23 - Mi 3 Mi 2 )/M22, M 2 4 = (^24 ~ U U Un)/u 2 2 

(P2.7.2.2) 

M 33 = y« 33 — m| 3 — Mj 3 , M 3 4 = (fl4 3 — M 2 4M 23 — Mi4«i 3 )/m 33 (P2.7.2.3) 
M44 = 044 n^ 4 - m^ 4 - u \ 4 


(P2.7.2.4) 




112 SYSTEM OF LINEAR EQUATIONS 


Mj. m = I ak m — Ui m Uik j / u, t* for m = k + 1 : V and k = 1 : V 

(P2.7.3b) 

(a) Make a MATLAB routine “cholesky ()”, which implements these for¬ 
mulas to perform Cholesky factorization. 

(b) Try your routine “cholesky()” for the following matrix and check if 
U T U — A ^ O (U: the upper triangular matrix). Compare the result with 
that obtained by using the MATLAB built-in routine “chol()”. 


A- 2 13 23 38 (P274) 

4 23 77 122 (F2./.4) 

7 38 122 294 

(c) Use the routine “lu_dcmp()” and the MATLAB built-in routine “lu()” 
to get the LU decomposition for the above matrix (P2.7.4) and check if 
P t LU -A«0, where L and U are the lower/upper triangular matrix, 
respectively. Compare the result with that obtained by using the MAT¬ 
LAB built-in routine “lu()”. 

2.8 Usage of SVD (Singular Value Decomposition) 

What is SVD good for? Suppose we have the singular value decomposition 
of an M x N real-valued matrix A as 


where U is an orthogonal M x M matrix, V an orthogonal N x N matrix, 
and S a real diagonal M x N matrix having the singular value tr, ’s of A (the 
square roots of the eigenvalues of A T A) in decreasing order on its diagonal. 
Then, it is possible to improvise the pseudo-inverse even in the case of 
rank-deficient matrices (with rank(A) < min(M, N)) for which the left/right 
pseudo-inverse can’t be found. The virtual pseudo-inverse can be written as 


where A -1 is the diagonal matrix having 1 /er,- on its diagonal that is recon¬ 
structed by removing all-zero(-like) rows/columns of the matrix S and substi¬ 
tuting 1 /<t, for <t, ^ 0 into the resulting matrix; V and U are reconstructed 
by removing the columns of V and U corresponding to the zero singular 
value(s). Consequently, SVD has a specialty in dealing with the singular 
cases. Let us take a closer look at this through the following problems. 
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(a) Consider the problem of solving 


! 2 3 
2 4 6 


Xl 

X 2 

X 3 



(P2.8.3) 


Since this belongs to the underdetermined case (M = 2 <3 = N), it 
seems that we can use Eq. (2.1.7) to find the minimum-norm solution, 

(i) Type the following statements into the MATLAB command window. 

»A1 = [1 2 3; 2 4 6]; bl = [6; 12]; x = A1 1 * (A1 *A1 1 ) ~-1*b1 %Eq. (2.1.7) 

What is the result? Explain why it is so and support your answer by 
typing 

»r = rank(AI) 


(ii) Type the following statements into the MATLAB command window 
to see the SVD-based minimum-norm solution. What is the value of 
x = A7'bi = VS- 1 t/ T bi and ||Aix — bill? 


[U,S,V] = svd(AI); %(P2.8.1) 
u = U(:,1:r); v = V(:,1:r); s = S(1:r,1:r); 

Alp = v*diag(l./diag(s))*u'; %faked pseudo-inverse (P2.8.2) 
x = AIp*b1 %minimum-norm solution for singular underdetermined 
err = norm(A1*x - bl) %residual error 


(iii) To see that the norm of this solution is less than that of any other 
solution which can be obtained by adding any vector in the null space 
of the coefficient matrix Ai, type the following statements into the 
MATLAB command window. What is implied by the result? 


nullA = null(AI); normx = norm(x); 
for n = 1:1000 

if norm(x + nullA*(rand(size(nullA,2),1)-0.5)) < normx 

disp('What the hell smaller-norm sol - not minimum norm 1 ); 
end 
end 


(b) For the problem 


'12 3' 
X= U 3 4 


Xl 

X 2 

.* 3 . 



(P2.8.4) 


compare the minimum-norm solution based on SVD and that obtained 
by Eq. (2.1.7). 
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(c) Consider the problem of solving 


A 3 x = 




(P2.8.5) 


Since this belongs to the overdetermined case (M = 4 > 3 = N), it 
seems that we can use Eq. (2.1.10) to find the LS (least-squares) solution, 

(i) Type the following statements into the MATLAB command window: 


»A3=[ 1 2 3; 4 5 9;7 11 18; -2 3 1]; 

»b3=[ 1 ;2;3;4]; x=(A3 ' *A3) A -1 *A3 ' *b3 %Eq. (2.1.10) 

What is the result? Explain why it is so in connection with the rank 
of A 3 . 

(ii) Similarly to (a)(ii), find the SVD-based least-squares solution. 


[U,S,V] = svd(A3); 

u=U(:,1:r); v = V(:,1:r); s = S(1:r,1:r); 
Alp = v*diag(1./diag(s))*u'; x = Alp*b 


(iii) To see that the residual error of this solution is less than that of 
any other vector around it, type the following statements into the 
MATLAB command window. What is implied by the result? 


err = norm(A3*x-b3) 
for n = 1:1000 

if norm(A3*(x+rand(size(x))-0.5)-b)<err 

dispf'What the hell smaller error sol - not LSE?'); 
end 
end 


(d) For the problem 


A 4 x = 



(P2.8.6) 


compare the LS solution based on SVD and that obtained by Eq. (2.1.10). 
(cf) This problem illustrates that SVD can be used for fabricating a universal 
solution of a set of linear equations, minimum-norm or least-squares, for 
all the possible rank deficiency of the coefficient matrix A. 
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2.9 Gauss-Seidel Iterative Method with Relaxation Technique 

(a) Try the relaxation technique (introduced in Section 2.5.3) with several 
values of the relaxation factor m = 0.2, 0.4,..., 1.8 for the following 
problems. Find the best one among these values of the relaxation factor 
for each problem, together with the number of iterations required for 
satisfying the termination criterion ||x*+i — x*[|/||x*|| < 10 -6 . 



(iii) The nonlinear equations (E2.5.1) given in Example 2.5. 


(P2.9.1) 

(P2.9.2) 


(b) Which of the two matrices A\ and A 2 has stronger diagonal dominancy 
in the above equations? For which equation does Gauss-Seidel iteration 
converge faster, Eq. (P2.9.1) or Eq. (P2.9.2)? What would you conjecture 
about the relationship between the convergence speed of Gauss-Seidel 
iteration for a set of linear equations and the diagonal dominancy of the 
coefficient matrix A? 

(c) Is the relaxation technique always helpful for improving the convergence 
speed of the Gauss-Seidel iterative method regardless of the value of 
the relaxation factor col 





INTERPOLATION 
AND CURVE FITTING 


There are two topics to be dealt with in this chapter, namely, interpolation 1 and 
curve fitting. Interpolation is to connect discrete data points in a plausible way 
so that one can get reasonable estimates of data points between the given points. 
The interpolation curve goes through all data points. Curve fitting, on the other 
hand, is to find a curve that could best indicate the trend of a given set of data. 
The curve does not have to go through the data points. In some cases, the data 
may have different accuracy/reliability/uncertainty and we need the weighted 
least-squares curve fitting to process such data. 

3.1 INTERPOLATION BY LAGRANGE POLYNOMIAL 

For a given set of N + 1 data points {Oc 0 , yob (* 1 , }’i),..., (x N , y^)}, we want 
to find the coefficients of an ATh-dcgrcc polynomial function to match them: 

p N (x) = a 0 + aix + a 2 x 2 -\ - \-a N x N (3.1.1) 

The coefficients can be obtained by solving the following system of linear 
equations. 

a 0 + x 0 ai + x^a 2 + ■ ■ ■ + x^a N = y 0 
a 0 + x\a\ + x\a 2 -\ - \-x*a N = y x 


ao + x N a\ + x\a 2 + ■ • ■ + x^a N = y N 

1 If we estimate the values of the unknown function at the points that are inside/outside the range 
of collected data points, we call it the interpolation/extrapolation. 
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But, as the number of data points increases, so does the number of unknown 
variables and equations, consequently, it may be not so easy to solve. That is 
why we look for alternatives to get the coefficients {ao, a \,..., a^}. 

One of the alternatives is to make use of the Lagrange polynomials 

(x - xi)(x - x 2 ) ■ ■ ■ (x - x N ) (x - x 0 )(x -x 2 )---(x- x n ) 

NX y ° (x 0 - xi)(x 0 - x 2 ) ■ ■ ■ (x 0 - x n ) yi (xi - xo)(xi - x 2 ) “ • (xi - x n ) 

_!_ (X - Xp)(x -xi)---(x- x N -i) 

(x N - x 0 )(x N - Xi) - (x N - X N -i) 

hv(x) = 'Y2y m L Ntm (x) with L N , m (x)= X^'" ( -— = f[ ~—— (3.1.3) 

^0 n ktm(Xm-X k ) tJn Xm ~ X * 

It can easily be shown that the graph of this function matches every data point 
l N (x m ) = y m Vm = 0, (3.1.4) 

since the Lagrange coefficient polynomial L N m (x) is 1 only for x = x m and zero 
for all other data points x = Xk (k ^ m). Note that the Nth-degree polynomial 
function matching the given TV + 1 points is unique and so Eq. (3.1.1) having 
the coefficients obtained from Eq. (3.1.2) must be the same as the Lagrange 
polynomial (3.1.3). 

Now, we have the MATLAB routine “lagranpO” which finds us the coef¬ 
ficients of Lagrange polynomial (3.1.3) together with each Lagrange coefficient 
polynomial L Nm (x). In order to understand this routine, you should know that 
MATLAB deals with polynomials as their coefficient vectors arranged in descend¬ 
ing order and the multiplication of two polynomials corresponds to the convolu¬ 
tion of the coefficient vectors as mentioned in Section 1.1.6. 


function [1,L] = lagranp(x,y) 

%Input : x = [xO xi ... xN], y = [yO yi ... yN] 

%0utput: 1 = Lagrange polynomial coefficients of degree N 
% L = Lagrange coefficient polynomial 

N = length(x)-1; %the degree of polynomial 
1 = 0; 

for m = 1:N + 1 
P = 1; 

for k = 1:N + 1 

if k -= m, P = conv(P,[1 -x(k)])/(x(m)-x(k)); end 
end 

L(m,:) = P; %Lagrange coefficient polynomial 
1=1+ y(m)*P; %Lagrange polynomial (3.1.3) 
end 

%do_lagranp.m 

x = [-2 -1 1 2]; y = [-6006]; % given data points 
1 = lagranp(x,y) % find the Lagrange polynomial 
xx = [-2: 0.02 : 2]; yy = polyval(l,xx); %interpolate for [-2,2] 
elf, plot(xx,yy, 1 b 1 , x,y, 1 * 1 ) %plot the graph 
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Figure 3.1 The graph of a third-degree Lagrange polynomial. 


We make the MATLAB program “do_lagranp.m” to use the routine 
“lagranpO” for finding the third-degree polynomial / 3 (x) which matches the 
four given points 

{(-2, -6), (-1,0), (1,0), (2, 6)} 

and to check if the graph of / 3 (x) really passes the four points. The results from 
running this program are depicted in Fig. 3.1. 

»do_lagranp 

1=1 0-1 0 % meaning / 3 (x) = 1 • x 3 + 0 • x 2 — 1 • x + 0 


3.2 INTERPOLATION BY NEWTON POLYNOMIAL 

Although the Lagrange polynomial works pretty well for interpolation irrespec¬ 
tive of the interval widths between the data points along the x-axis, it requires 
restarting the whole computation with heavier burden as data points are appended. 
Differently from this, the /Vlh-dcgree Newton polynomial matching the N + 1 
data points {(xo, >’o), (xi, yi),..., (x/y, >\v)} can be recursively obtained as the 
sum of the ( N — l)th-degree Newton polynomial matching the N data points 
{(x 0 , Jo)* (ti , >’i),..., (x/v-i, yN— i)} and one additional term. 

n N (x) = a 0 + «i(x - x 0 ) + a 2 (x - x 0 )(x — xi) -|- 

= n N -i(x) + a N (x - x 0 )(x - xi) • • • (x - xjy_i) with n 0 (x) = a 0 

(3.2.1) 

In order to derive a formula to find the successive coefficients {ao, a \,..., < 2 /v} 
that make this equation accommodate the data points, we will determine «o and 
a\ so that 


i(x) = n 0 (x) + «i(x x 0 ) 


(3.2.2) 
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matches the first two data points (jc 0 , yo) and (x \, yi). We need to solve the two 
equations 


«i(*o) = a 0 + a x (x 0 - x 0 ) = yo 
«i(*i) =a 0 +a l (xi - x 0 ) = y x 


to get 

yi-ao yi-yo .... „ 0 

a 0 = y 0 , m =-=-= Df 0 (3.2.3) 

Xi -x 0 xi - x 0 

Starting from this first-degree Newton polynomial, we can proceed to the second- 
degree Newton polynomial 

n 2 (x) = n\(x) + a 2 (x - x 0 )(x - *0 = a 0 + a x (x ~ * 0 ) + a 2 (x - x 0 )(x - xi) 

(3.2.4) 

which, with the same coefficients a 0 and a\ as (3.2.3), still matches the first 
two data points (x 0 , yo) and (x\ , >t ), since the additional (third) term is zero 
at (xq, yo) and (xj, yi). This is to say that the additional polynomial term does 
not disturb the matching of previous existing data. Therefore, given the addi¬ 
tional matching condition for the third data point ( x 2 , y 2 ), we only have to 
solve 

n 2 (x 2 ) =a 0 + a\(x 2 - x 0 ) + a 2 (x 2 - x 0 )(x 2 - x x ) = y 2 


for only one more coefficient a 2 to get 


yi-yo, , 

„ „ n r 'v Ti-yo- (x 2 -x 0 ) 

y 2 — <a 0 — a] (x 2 — x 0 ) _ x x — x 0 

(x 2 - x 0 )(x 2 - xi ) (x 2 - x 0 )(x 2 - Xi ) 


y 2 - yi + yi - yo - 


-(x 2 -xi +x x -x 0 ) 


(x 2 - x 0 )(x 2 - x x ) 


y 2 - yi _ yi - yo 

y 2 - Xi x\ Xp _ Dfi - Dfo = D 2^ 
x 2 - x 0 x 2 — x 0 


(3.2.5) 


Generalizing these results (3.2.3) and (3.2.5) yields the formula to get the Nth 
coefficient cim of the Newton polynomial function (3.2.1) as 


a N 


P N ~ Vi ~ P N ~ l fo 

X N -X 0 


= D N f 0 


(3.2.6) 


This is the divided difference, which can be obtained successively from the 
second row of Table 3.1. 
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Table 3.1 Divided Difference Table 


** 

yk 

Df k 

D 2 ft 

D 2 ft 

- 

*0 

yo 

Df 0 = ^ 

X\ -T 0 

DV. = D/ '- 0/ “ 

X2-X 0 

x 3 -x 0 

- 

Xl 

yi 

Dfi = ^ 

X 2 -Xy 

D 2fi — D f 2 — D f l 

X 3 - Xl 

- 


X2 

yi 

D f 2 = y_i^yi 
x 3 — x 2 

~ 



X 3 

y 3 

- 





function [n,DD] = newtonp(x,y) 

%Input : x = [xO xl ... xN] 

% y = [yO yi ... yN] 

%0utput: n = Newton polynomial coefficients of degree N 
N = length(x)-1; 

DD = zeros(N + 1,N + 1); 

DD(1:N + 1,1) = y'; 
for k = 2:N + 1 

for m = 1: N + 2 - k %Divided Difference Table 

DD(m,k) = (DD(m + 1,k - 1) - DD(m,k - 1))/(x(m + k - 1)- x(m)); 
end 
end 

a = DD(1,:); %Eq.(3.2.6) 
n = a(N+1); %Begin with Eq.(3.2.7) 
for k = N:-1:1 %Eq.(3.2.7) 

n = [n a(k)] - [0 n*x(k)]; %n(x)*(x - x(k - 1))+a_k - 1 
end 


Note that, as mentioned in Section 1.3, it is of better computational efficiency to 
write the Newton polynomial (3.2.1) in the nested multiplication form as 


n N (x) = ((••• (a N (x - jcjv-i) + a N -i)(x - x N - 2 ) H-) + ai)(x - x 0 ) + a 0 

(3.2.7) 

and that the multiplication of two polynomials corresponds to the convolution 
of the coefficient vectors as mentioned in Section 1.1.6. We make the MATLAB 
routine “newtonpf)” to compose the divided difference table like Table 3.1 and 
construct the Newton polynomial for a set of data points. 

For example, suppose we are to find a Newton polynomial matching the fol¬ 
lowing data points 


{(-2, -6), (-1,0), (1,0), (2, 6), (4, 60)} 


From these data points, we construct the divided difference table as Table 3.2 
and then use this table together with Eq. (3.2.1) to get the Newton polynomial 
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Table 3.2 Divided differences 


A k 

yk 

Df k 

D 2 f k 

D 3 f k 

D 4 f k 

-2 

-6 

0 — (—6) 

0-6 

2-(-2) , 

1—1 n 

-1 - (-2) 

1 - (-2) 

2 - (-2) 

4 - (-2) 

-1 

0 

i^r 0 


7-2 

4— (—1) “ 


1 

0 

>1. > 




2 

6 

^=27 

4-2 




4 

60 






n(x) = y 0 + Df 0 (x - a 0 ) + D 2 f 0 (x - a 0 )(a - Ai) 

+ £> 3 / 0 (a - a 0 )(a - xi)(x - x 2 ) + 0 
= -6 + 6(x - (-2)) - 2(x - (-2))(a - (-1)) 

+ I (a- - (—2)) (a - ( 1 ))(a - 1) 

= —6 + 6(a + 2) — 2 (a + 2) (a + 1) + (a + 2)(a 2 — 1) 

= a 3 + (-2 + 2)a 2 + (6 — 6 — 1)a — 6+12 — 4 — 2 = a 3 —a 

We might begin with not necessarily the first data point, but, say, the third one 
(1,0), and proceed as follows to end up with the same result. 


n ( a) = y 2 + Df 2 (x - a 2 ) + D 2 f 2 (x - x 2 )(x - a 3 ) 

+ D 3 f 2 (x - a 2 )(a - a 3 )(a - a 4 ) + 0 
= 0+ 6(a - 1) + 7 (a - 1)(a - 2) + 1(a - 1)(a - 2)(a - 4) 

= 6 (a - 1) + 7 (a 2 - 3a + 2) + (a 2 - 3a + 2)(a - 4) 

= a 3 + (7 - 7)a 2 + (6 - 21 + 14)a - 6+14- 8 = a 3 - a 

This process is cast into the MATLAB program “do_newtonp.m”, which illus¬ 
trates that the Newton polynomial (3.2.1) does not depend on the order of the 
data points; that is, changing the order of the data points does not make any 
difference. 
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%do_newtonp.m 

x = [-2 -1 1 2 4]; y = [-6 0 0 6 60]; 

n = newtonp(x,y) %1 = lagranp(x,y) for comparison 
x = [-1 -2 1 2 4]; y = [ 0 -6 0 6 60] ; 

nl = newtonp(x,y) %with the order of data changed for comparison 
XX = [-2:0.02: 2]; yy = polyval(n,xx); 
elf, plot(xx,yy, 1 b-',x,y,'*') 


Now, let us see the interpolation problem from the viewpoint of approximation. 
For this purpose, suppose we are to approximate some function, say, 


/« = 


by a polynomial. We first pick up some sample points on the graph of this 
function, such as listed below, and look for the polynomial functions n$(x), n%(x), 
and n\o(x) to match each of the three sets of points, respectively. 


Xk 

yk 


0.5 

1/3 


1.0 

1/9 


-1.0 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 


1.0 -0.8 -0.6 -0.4 


1/9 25/153 25/97 25/57 25/33 1 25/33 25/57 25/97 25/153 1/9 


We made the MATLAB program “do_newtonp1 .m” to do this job and plot the 
graphs of the polynomial functions together with the graph of the true function 
f(x) and their error functions separately for comparison as depicted in Fig. 3.2, 
where the parts for « 8 (x) and riwix) are omitted to provide the readers with 
some room for practice. 


%do_newtonp1.m - plot Fig.3.2 
x = [-1 -0.5 0 0.5 1.0]; y = f31(x); 

n = newtonp(x,y) 

xx = [-1:0.02: 1]; %the interval to look over 
yy = f31(xx); %graph of the true function 

yyl = polyval(n,xx); %graph of the approximate polynomial function 

subplot(221), plot(xx,yy, 1 k-', x,y,'o', xx,yy1,'b') 

subplot(222), plot(xx,yyl-yy,'r') %graph of the error function 


function y = f31(x) 
y=1./(1+8*x."2); 
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(a) 4/8/10 th -degree polynomial 
approximation 

Figure 3.2 Interpolation from 


0.5 

-n 10 (x) - f(x) 

A 



\ 

-0.5 

\Y^ sr ° 

\y^n 4 (x)-f(x) 

-n 8 (x)-/(x) 



(b) The error between the approximating 
polynomial and the true function 

i viewpoint of approximation. 


Remark 3.1. Polynomial Wiggle and Runge Phenomenon. Here is one thing to 
note. Strangely, increasing the degree of polynomial contributes little to reducing 
the approximation error. Rather contrary to our usual expectation, it tends to make 
the oscillation strikingly large, which is called the polynomial wiggle and the error 
gets bigger in the parts close to both ends as can be seen in Fig. 3.2, which is 
called the Runge phenomenon. That is why polynomials of degree 5 or above are 
seldom used for the purpose of interpolation, unless they are sure to fit the data. 


3.3 APPROXIMATION BY CHEBYSHEV POLYNOMIAL 


At the end of the previous section, we considered a polynomial approximation 
problem of finding a polynomial close to a given (true) function /( x) and have 
the freedom to pick up the target points {xo, x\,..., xn] in our own way. Once 
the target points have been fixed, it is nothing but an interpolation problem that 
can be solved by the Lagrange or Newton polynomial. 

In this section, we will think about how to choose the target points for better 
approximation, rather than taking equidistant points along the x axis. Noting that 
the error tends to get bigger in the parts close to both ends of the interval when 
we chose the equidistant target points, it may be helpful to set the target points 
denser in the parts close to both ends than in the middle part. In this context, a 
possible choice is the projection (onto the x axis) of the equidistant points on the 
circle centered at the middle point of the interval along the x axis (see Fig. 3.3). 
That is, we can choose in the normalized interval [—1, +1] 

x' k = cos ^ 1 ——tt for k = 0,1,..., N (3.3.1a) 

* 2(/V+l) 

and for an arbitrary interval [o, b]. 


2(N + 1) 


for k = 0, 1,..., N 
(3.3.1b) 
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which are referred to as the Chebyshev nodes. The approximating polynomial 
obtained on the basis of these Chebyshev nodes is called the Chebyshev polynomial. 
Let us try the Chebyshev nodes on approximating the function 


' 1 + 8jc 2 


We can set the 5/9/11 Chebyshev nodes by Eq. (3.3.1) and get the Lagrange 
or Newton polynomials c^{x), c 8 (x), and C| 0 Oc) matching these target points, 
which are called the Chebyshev polynomial. We make the MATLAB program 
“do_lagnewch . m” to do this job and plot the graphs of the polynomial functions 
together with the graph of the true function /( x) and their error functions sep¬ 
arately for comparison as depicted in Fig. 3.4. The parts for c$(x) and ctq(x) 
are omitted to give the readers a chance to practice what they have learned in 
this section. 


%do_lagnewch.m - plot Fig.3.4 
N = 4; k = [0:N]; 

x=cos((2*N + 1 - 2*k)*pi/2/(N + 1)); %Chebyshev nodes(Eq.(3.3.1)) 
y=f31(x); 

c=newtonp(x,y) %Chebyshev polynomial 

xx = [-1:0.02: 1]; %the interval to look over 

yy = f31(xx); %graph of the true function 

yyl = polyval(c,xx); %graph of the approximate polynomial function 

subplot(221), plot(xx,yy,'k-', x,y,'o', xx,yy1,'b') 

subplot(222), plot(xx,yyl-yy, 1 r') %graph of the error function 


Comparing Fig. 3.4 with Fig. 3.2, we see that the maximum deviation of the 
Chebyshev polynomial from the true function is considerably less than that of 
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approximating polynomial and the true function 
Figure 3.4 Approximation using the Chebyshev polynomial. 

Lagrange/Newton polynomial with equidistant nodes. It can also be seen that 
increasing the number of the Chebyshev nodes—or, equivalently, increasing 
the degree of Chebyshev polynomial—makes a substantial contribution towards 
reducing the approximation error. 

There are several things to note about the Chebyshev polynomial. 

Remark 3.2. Chebyshev Nodes and Chebyshev Coefficient Polynomials T m (x) 

1. The Chebyshev coefficient polynomial is defined as 

7V +1 (jc') = cos ((N + 1)cos -1 x) for — 1 < x < +1 (3.3.2) 

and the Chebyshev nodes defined by Eq. (3.3.1a) are actually zeros of this 
function: 

T s +i (x) = cos ((N + 1) cos” 1 x') = 0, (N + 1) cos" 1 x = (2k' + 1)tt/2 

2. Equation (3.3.2) can be written via the trigonometric formula in a recursive 
form as 

T n+ i(x) = cos(cos _1 x + N cos -1 x) 

= 008 ( 003 -' x ) cos (N cos -1 x ) — sinfcos- 1 x ) sin(!V cos -1 x ) 

= x T n (x ) + -{cos ((N + 1) cos -1 x ) — cos((iV — 1) cos -1 x )} 

= xT n (x) + ^T n+ i(x) - ^7jv_i(x) 

T n+1 (x) = 2x't n (x)-T n _ 1 (x) for N > 1 (3.3.3a) 

T 0 (x ) = cosO =1, Ti(x ) = cos (cos 1 x ) = x (3.3.3b) 

3. At the Chebyshev nodes x k defined by Eq. (3.3.1a), the set of Chebyshev 
coefficient polynomials 


{?b(x), 7i(x'),.... T n (x)} 
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are orthogonal in the sense that 

E T ’>‘ (**) T n (**) = 0 for m^n 
k=0 

E = I ^r for m ^ 0 

E r 0 2 (4) = Af + 1 for m = 0 


(3.3.4a) 


(3.3.4b) 


(3.3.4c) 


4. The Chebyshev coefficient polynomials T^ + \ (x ) for up to N = 6 are col¬ 
lected in Table 3.3, and their graphs are depicted in Fig. 3.5. As can be 
seen from the table or the graph, the Chebyshev coefficient polynomials of 
even/odd degree (N + 1) are even/odd functions and have an equi-ripple 
characteristic with the range of [—1, +1], and the number of rising/falling 
(intervals) within the domain of [—1, +1] is N + 1. 


We can make use of the orthogonality [Eq. (3.3.4)] of Chebyshev coefficient 
polynomials to derive the Chebyshev polynomial approximation formula. 



Figure 3.5 Chebyshev polynomial functions. 
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Table 3.3 Chebyshev Coefficient Polynomials 


T o(x') = \ 

T\(x ) = x (x : a variable normalized onto [—1, 1]) 
T 2 (x) = lx' 1 - 1 
T 2 (x) = 4x' 3 — 3x' 

T 4 (x') = 8 t ' 4 - 8x' 2 + 1 
T 5 (x') = 16 t ' 5 - 20x' 3 + 5x' 

T 6 (x') = 32x' 6 - 48t' 4 + 18x' 2 - 1 
T 7 (x) = 64x 7 — 112 t 5 + 56 t 3 — 7x 


where 


d m 


fWT„(x’ t ) = ^ g /(«) 


-V f ( Xk ) cos 

N+ 1 f-T 


m(2N + 1 — 2&) 


(3.3.6a) 


1,2, ...,N 

(3.3.6b) 


function [c,x,y] = cheby(f,N,a,b) 
%Input : f = function name on [a,b] 


%Output: c = Newton polynomial coefficients of degree N 

% (x,y) = Chebyshev nodes 

if nargin == 2, a = -1; b = 1; end 
k = [0: N]; 

theta = (2*N + 1 - 2*k)*pi/(2*N + 2); 


xn = cos(theta); 

%Eq.(3.3.1a) 

x = (b - a)/2*xn +(a + b)/2; 
y = feval(f,x); 
d( 1 ) - y*ones(N + 1,1)/ (N+1); 
for m = 2: N + 1 

%Eq.(3.3.1b) 

cos_mth = cos((m-1)*theta); 
d(m) = y*cos mth'*2/(N + 1); 

%Eq.(3.3.6b) 

end 


xn = [2 -(a + b)]/(b - a); 

%the inverse of (3.3.1b) 

T 0 = 1; T 1 = xn; 

%Eq.(3.3.3b) 

c = d(1)*[0 TO] +d(2)*T_1; 
for m = 3: N+1 

%Eq.(3.3.5) 

tmp = T 1; 

T_1 = 2*conv(xn,T_1) -[0 0 T_0]; 

%Eq.(3.3.3a) 

T_0 = tmp; 

c = [0 c] + d(m)*T_1; 

%Eq.(3.3.5) 

end 
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We can apply this formula to get the polynomial approximation directly for 
a given function f(x), without having to resort to the Lagrange or Newton 
polynomial. Given a function, the degree of the approximate polynomial, and the 
left/right boundary points of the interval, the above MATLAB routine “cheby ()” 
uses this formula to make the Chebyshev polynomial approximation. 

The following example illustrates that this formula gives the same approximate 
polynomial function as could be obtained by applying the Newton polynomial 
with the Chebyshev nodes. 

Example 3.1. Approximation by Chebyshev Polynomial. Consider the problem 
of finding the second-degree (N = 2) polynomial to approximate the function 
f(x) = 1/(1 + 8x 2 ). We make the following program “do_cheby. m”, which uses 
the MATLAB routine “cheby ()” for this job and uses Lagrange/Newton polyno¬ 
mial with the Chebyshev nodes to do the same job. Readers can run this program 
to check if the results are the same. 


%do_cheby.m 
N = 2; a = -2; b = 2; 

[c,x1,y1] = cheby( 1 f31 1 ,N,a,b) %Chebyshev polynomial ftn 
%for comparison with Lagrange/Newton polynomial ftn 

k = [0:N]; xn = cos((2*N + 1 - 2*k)*pi/2/(N + 1));%Eq.(3.3.1 a):Chebyshev nodes 
x = ((b-a)*xn +a + b)/2; %Eq.(3.3.1b) 
y = f31(x); n = newtonp(x,y), 1 = lagranp(x,y) 


»do_cheby 

C = -0.3200 -0.0000 1.0000 


3.4 PADE APPROXIMATION BY RATIONAL FUNCTION 

Pade approximation tries to approximate a function f(x) around a point x° by a 
rational function 

Pm n(x ~ x°) = w ith M = N or M = N+l 

D n (x-x°) 

_ go + qi(x- x°) + q 2 (x- x°) 2 -|- V q M (x - x°) M 

\+<h(x- x°) + d 2 (x - x°) 2 -1- d N (x - x°) N 

(3.4.1) 


where f(x°), fix 0 ), f (2} (x °),.... f {M+N) (x°) are known. 

How do we find such a rational function? We write the Taylor series expansion 
of f{x) up to degree M + N at x = x° as 
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fix) 


Tm+N (x - X°) = f{x°) + f\x°){x - x°) 


x o-jM+N 


— oo + <*i(x — x°) + ci2(x — x°) 2 + • • • + <zm+n(x — x°) M+N (3.4.2) 

Assuming x° = 0 for simplicity, we get the coefficients of D N (x) and Qm(x) 
such that 


(ao + a\x + ■ ■ ■ + 0 -m+nx )(1 + d\x + ■ 

_ -(qo+qix -\ - Vq M x M ) 

1 + d\x + d 2 X 2 + ■ ■ ■ + d^x N 

(a 0 + a\x H-1- a M +NX M+N ){ 1 + d\x + ■ ■ 

= qo + q\x + ■ ■ ■ + qMX M 
by solving the following equations: 


a 2 


- a 0 d x 

- a\d\ 


= qo 
= q\ 

= q 2 (3.4.4a) 


ClM + dM-\d\ 
dM+ 1 + ®Md 1 


+ a M ~\d 2 
+ a M d 2 


CIm+2 

cim+n + a,M+N-\d\ + a-M+N-idi 


+ ClM-NdN 

+ OM-N+ldfii 

+ OM-N+ldN 
+ a M dN 


= qM 
= 0 
= 0 

= 0 


(3.4.4b) 


Here, we must first solve Eq. (3.4.4b) for d i, d 2 ,..., r/,v and then substitute d ,’s 

into Eq. (3.4.4a) to obtain q 0 , q\, - qM- 

The MATLAB routine “padeapO” implements this scheme to find the coef¬ 
ficient vectors of the numerator/denominator polynomial Qm(x)/D n (x ) of the 
Pade approximation for a given function f(x). Note the following things: 


The derivatives f '(x°), / (2) ( x °),..., f <M + N )( x °) U p to order (M + N) are 
computed numerically by using the routine “difapx()”, that will be intro¬ 
duced in Section 5.3. 

In order to compute the values of the Pade approximate function, we substi¬ 
tute (x — x°) for x in Pm,n{x) which has been obtained with the assumption 
that x° = 0. 
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function [num,den] = padeap(f,xo,M,N,xO,xf) 

%Input : f = function to be approximated around in [xo, xf] 

%0utput: num = numerator coeffs of Pade approximation of degree M 
% den = denominator coeffs of Pade approximation of degree N 

a(1) = feval(f,xo); 
h = .01; tmp = 1; 
for i = 1:M + N 

tmp = tmp*i*h; %i!h^i 

dix = difapx(i,[-i i])*feval(f,xo+[-i:i]*h) 1 ; %derivative(Section 5.3) 
a(i + 1) = dix/tmp; %Taylor series coefficient 

for m = 1:N 

n = 1:N; A(m,n) = a(M + 1 + m - n); 
b(m) = -a(M + 1 + m); 

d = A\b'; %Eq.(3.4.4b) 
for m = 1: M + 1 

mm = min(m - 1,N); 

q(m) = a(m:-1:m - mm)*[1; d(1:mm)]; %Eq.(3.4.4a) 

num = q(M + 1:-1:1)/d(N); den = [d(N:-1:1)' 1]/d(N); %descending order 
if nargout == 0 % plot the true ftn, Pade ftn and Taylor expansion 
if nargin <6, xO = xo - 1; xf = xo + 1; end 
x = xO+[xf-xO]/100*[0:100]; yt = feval(f,x); 
xl = x-xo; yp = polyval(num,x1)./polyval(den,x1); 
yT = polyval(a(M + N + 1: -1:1),xl); 
elf, plot(x,yt,'k', x,yp,'r', x,yT,'b') 


Example 3 . 2 . Pade Approximation for f{x) = e x . Let’s find the Pade approx¬ 
imation P3,2(x) = Q 3 (x)/D 2 (x) for f(x) = e x around x° = 0. We make the 
MATLAB program “do_pade.m”, which uses the routine “padeapO” for this 
job and uses it again with no output argument to see the graphic results as 
depicted in Fig. 3.6. 

»do_pade %Pade approximation 

n = 0.3333 2.9996 11.9994 19.9988 

d= 1.0000 -7.9997 19.9988 


%do_pade.m to get the Pade approximation for f(x) = e*x 
fl = inline('exp(x)','x'); 

M = 3; N = 2; %the degrees of Numerator Q(x) and Denominator D(x) 
xo = 0; %the center of Taylor series expansion 
[n,d] = padeap(f1,xo,M,N) %to get the coefficients of Q(x)/P(x) 
xO = -3.5; xf = 0.5; %left/right boundary of the interval 
padeap(f1,xo,M,N,xO,xf) %to see the graphic results 


To confirm and support this result from the analytical point of view and to help 
the readers understand the internal mechanism, we perform the hand-calculation 
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Figure 3.6 Pade approximation and Taylor series expansion for f(x) = ^(Example 3.2.). 


procedure. First, we write the Taylor series expansion at x = 0 up to degree 
M + N = 5 for the given function fix) = e x as 


^ f»\x) ; 

T y^= L —£r x 

whose coefficients are 


1 

¥X+ 2* 


1 4 1 5 

h 4!* + 5f + - 


(E3.2.1) 




We put this into Eq. (3.4.4b) with M = 3, N = 2 and solve it for df s to get 
D 2 (x) = 1 + d\x + d 2 x 2 . 


a 4 +a 3 d l +a 2 d 2 = 0 f 1/6 l/« * 1 _ [ -1/24 ] U 1 _ T -2/5 1 
a 3 + a 2 d\ + a\d 2 = 0 ’ [ 1/24 1/6 J [d 2 J _ [-i/120 J ’ [d 2 \ _ L al / 20 J 

(E3.2.3) 


Substituting this to Eq. (3.4.4a) yields 
q 0 = a 0 = 1 

qi=a\+ a 0 di = 1 + 1 x (-2/5) = 3/5 

q 2 = a 2 + aidi + a 0 d 2 = 1/2 + 1 x (-2/5) + 1 x (1/20) = 3/20 

93 = 03 + 02^1 + aid 2 = 1/6 + (1/2) x (-2/5) + 1 x (1/20) = 1/60 

(E3.2.4) 
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With these coefficients, we write the Pade approximate function as 

g 3 (*) 1 + (3/5)* + (3/20)x 2 + (l/60)x 3 

PX2(X) D 2 (x) 1 + (-2/5)* + (1/20 )* 2 

_ (1/3)jc 3 + 3x 2 + 12* + 20 


3.5 INTERPOLATION BY CUBIC SPLINE 

If we use the Lagrange/Newton polynomial to interpolate a given set of N + 1 
data points, the polynomial is usually of degree N and so has N — 1 local extrema 
(maxima/minima). Thus, it will show a wild swing/oscillation (called ‘polynomial 
wiggle’), particularly near the ends of the whole interval as the number of data 
points increases and so the degree of the polynomial gets higher, as illustrated 
in Fig. 3.2. Then, how about a piecewise-linear approach, like assigning the 
individual approximate polynomial to every subinterval between data points? 
How about just a linear interpolation—that is, connecting the data points by 
a straight line? It is so simple, but too short of smoothness. Even with the 
second-degree polynomial, the piecewise-quadratic curve is not smooth enough 
to please our eyes, since the second-order derivatives of quadratic polynomials 
for adjacent subintervals can’t be made to conform with each other. In real 
life, there are many cases where the continuity of second-order derivatives is 
desirable. For example, it is very important to ensure the smoothness up to order 2 
for interpolation needed in CAD (computer-aided design)/CAM (computer-aided 
manufacturing), computer graphic, and robot path/trajectory planning. That’s why 
we often resort to the piecewise-cubic curve constructed by the individual third- 
degree polynomials assigned to each subinterval, which is called the cubic spline 
interpolation. (A spline is a kind of template that architects use to draw a smooth 
curve between two points.) 

For a given set of data points {( x k , y k ), k = 0 : N], the cubic spline six) 
consists of N cubic polynomial s*(*)’s assigned to each subinterval satisfying 
the following constraints (S0)-(S4). 

(50) j(*) = s k (x) = S k ' 3 (x - x k f + S ki2 (x - x k ) 2 + S*,i(* - x k ) + S k ,o 
for * G [**, **+i], k = 0 : N 

(51) s k (x k ) = S kfi = y k for k = 0 : N 

(52) i(**) s= s k (x k ) = %i = y k for k = 1 : N — 1 

(53) s'^ixk) ee s' k {x k ) m : S kA for k = 1 : N - 1 

(54) <_,(**) = 4{x k ) = 2S ka for k = 1 : N — 1 

These constraints (S1)-(S4) amount to a set of N + 1 + 3 (N — 1) = 4 N — 2 
linear equations having 4 N coefficients of the N cubic polynomials 


{Sfc,o> S*,i, S k<2 , S kt 3 , k = 0 : N — 1} 
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Table 3.4 Boundary Conditions for a Cubic Spline 


(ii) 


(hi) 


First-order derivatives s' 0 (xg) = So,i> s' N (x N ) = SV,i 
specified 

Second-order ,fg (to) = 2So,2, s'n( x n) = 2Sn,i 

derivatives specified 
(end-curvature adjusted) 

Second-order derivatives s% (xo) = s" (xi) + — (s" (xi) — s 2 (X2)) 
extrapolated ^ 

4(XAf> = s^^xjy-i) + i(x,v i) - s^_ 2 (x N - 2 )) 


as their unknowns. Two additional equations necessary for the equations to be 
solvable are supposed to come from the boundary conditions for the first/second- 
order derivatives at the end points (xo, yo) and (xn, )>n) as listed in Table 3.4. 

Now, noting from (SI) that .S^.o = )’k, we will arrange the constraints (S2)-(S4) 
and eliminate Sk,i, s to set up a set of equations with respect to the N + 1 
unknowns {Sk,2, k = 0 : N}. In order to do so, we denote each interval width by 
hk = Xk +1 — Xk and substitute (SO) into (S4) to write 

s'k(Xk+ 1) = bSk^hk + 2Sk,2 = s'k+l( x k+l) = 2Sjfc+l,2 

Sk,?>hk = ^ (•Sjfc+1,2 — Sk,2) 

Sk-I,3hk-1 = -(*$*,2 - Sk-1,2) 

We substitute these equations into (S2) with k + 1 in place of k 

Sk(xk+ 1 ) = S*,3(**+t - Xkf + S k ,2(Xk+ 1 - Xk ) 2 + S k , 1 (JC *+1 - Xk) + S k '0 = y k+ 1 
Sk.ihl + Sk,2hl + $k,\hk + yk = yk +1 

to eliminate S k ,3 s and rewrite it as 

■y(^t'+l,2 — $k,z) + Sk,2hk + Sk,! 
hk($k+ 1,2 + 25 ^ 2 ) + 
hk-l(Sk,2 + 1Sk-\,2) + 

We also substitute Eq. (3.5.1b) into (S3) 

s'k-\( x k) = ^>Sk~\^h\_i + 2Sk-i,2hk~i + Sk-i,% = s'k( x k) = Sk-i 


h k 

= 3dy k 
= 2dy k - i 


= dy k 


(3.5.2a) 

(3.5.2b) 


(3.5.1a) 

(3.5.1b) 
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to write 


Sk ,i - •S'*—1,1 = h k -i(S k ,2 - S k - 1, 2 ) + 2 hk-iSk-i,± = h k -i(S k ,2 + S k - 1,2) 

(3.5.3) 

In order to use this for eliminating S k ,\ from Eq. (3.5.2), we subtract (3.5.2b) 
from (3.5.2a) to write 


hk(Sk+ 1,2 + 2 ^, 2 ) — h k ^\(Sk ,2 + 25);- 1 , 2 ) + 3(5^i — S*_i,i) — 3 {dy k — dy k ~ 1 ) 
and then substitute Eq. (3.5.3) into this to write 


hk(Sk+ 1,2 + 25*,2) — h k ~i(Sk,2 + 2Sk — 1,2) + 2h k -i(S k ,2 + Sk- (,2) 

= 3(dyt - rfy*-i) 

A*-iS*-i ,2 + 2(h k _\ + h k )S k:2 + h k S k +\,2 = 3(dy* — dy k -\) (3.5.4) 
for k = 1 : IV — 1 

Since these are N — 1 equations with respect to N + 1 unknowns {S k 2, k = 0 : 
N], we need two more equations from the boundary conditions to be given as 
listed in Table 3.4. 

How do we convert the boundary condition into equations? In the case where 
the first-order derivatives on the two boundary points are given as (i) in Table 3.4, 
we write Eq. (3.5.2a) for k = 0 as 


h 0 (S U2 + 2 ^ 0 , 2 ) + 3So,i = 3dyo, 
We also write Eq. (3.5.2b) for k = N as 


2 hoSo ,2 + hoSi ,2 = 3(dyo — So,i) 

(3.5.5a) 


^jv-i(5)v,2 + 2S)v-i,2) + 3Sjv_i : i — 3 dy^- \ 


and substitute (3.5.3)(k = N) into this to write 

hN-i(SN ,2 + 2S N -i, 2 ) + 3S Nt i - 3h N -i(S Nt 2 + S N - 1 , 2 ) = 3dy N -\ 

/i,v_i Sat_i,2 + 2htj-iSN,2 = 3(5;v,i — dyu- 1 ) (3.5.5b) 

Equations (3.5.5a) and (3.5.5b) are two additional equations that we need to solve 
Eq. (3.5.4) and that’s it. In the case where the second-order derivatives on the 
two boundary points are given as (ii) in Table 3.4, So ,2 and Sal 2 are directly 
known from the boundary conditions as 


So ,2 = s'o(xo)/2, S Na = s„(x N )/2 


(3.5.6) 
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and, subsequently, we have just N — l unknowns. In the case where the second- 
order derivatives on the two boundary points are given as (iii) in Table 3.4 

Sg(x 0 ) = s"(x i) + ^0"(xi) - Sjfe)) 

s'n(Xn) = $N-l( x N-l) + 7 —— ($N -1 ( X N- 1) - s'n- 2 (XN-2)) 
n-N-2 

we can instantly convert these into two equations with respect to S 0 ,2 and 5^,2 as 

h\So, 2 — (/*o + ^0^1,2 + hoS 2 ,2 — 0 (3.5.7a) 

Iin-iSn.i — (h-N -1 + h N - 2 )SN-i, 2 + h N -iS N - 2 , 2 = 0 (3.5.7b) 

Finally, we combine the two equations (3.5.5a) and (3.5.5b) with Eq. (3.5.4) 
to write it in the matrix-vector form as 

2ho ho 0 • • 

h 0 2(h 0 + hi) hi 

0 • 0 

• h N - 2 2(%_ 2 + fcjv-i) %-i 

• 0 hjv_i 

3(dy 0 - 5 0 ,i) 

3(dyi - dy 0 ) 

(3.5.8) 

3(dyiv_i - dyv- 2 ) 

3(5jv,i - dy N -i) _ 

After solving this system of equation for {57,2, k = 0 : N], we substitute them 
into (SI), (3.5.2), and (3.5.1) to get the other coefficients of the cubic spline as 

(SI) (3.5.2) h k (3.5.1) S*+i,2 - S*, 2 

57,o = yk, 57,1 = ^ — —($*+ 1,2 + 257 , 2 ), Sk ,3 = - — - (3.5.9) 

3 3 h k 

The MATLAB routine “cspline()” constructs Eq.(3.5.8), solves it to get the 
cubic spline coefficients for given x,y coordinates of the data points and the 
boundary conditions, uses the mkpp() routine to get the piecewise polynomial 
expression, and then uses the ppval () routine to obtain the value(s) of the piece- 
wise polynomial function for xi—that is, the interpolation over xi. The type of 
the boundary condition is supposed to be specified by the third input argument 
KC. In the case where the boundary condition is given as (i)/(ii) in Table 3.4, 
the input argument KC should be set to 1/2 and the fourth and fifth input argu¬ 
ments must be the first/second derivatives at the end points. In the case where 
the boundary condition is given as extrapolated like (iii) in Table 3.4, the input 
argument KC should be set to 3 and the fourth and fifth input arguments do not 
need to be fed. 




INTERPOLATION BY CUBIC SPLINE 


function [yi,S] = cspline(x,y,xi,KC,dyO,dyN) 

%This function finds the cubic splines for the input data points (x,y) 
%Input: x = [xO xl ... xN], y = [yO yl ... yN], xi=interpolation points 
% KC = 1/2 for 1st/2nd derivatives on boundary specified 

% KC = 3 for 2nd derivative on boundary extrapolated 

% dyO = S'(xO) = SOI: initial derivative 

% dyN = S'(xN) = SN1: final derivative 

%0utput: S(n,k); n = 1:N, k = 1,4 in descending order 
if nargin < 6, dyN = 0; end, if nargin < 5, dyO = 0; end 
if nargin <4, KC = 0; end 
N = length(x) - 1; 

% constructs a set of equations w.r.t. {S(n,2), n = 1:N + 1} 

A = zeros(N + 1,N +1); b = zeros(N + 1,1); 

S = zeros(N +1,4); % Cubic spline coefficient matrix 

k = 1:N; h(k) = x(k + 1) - x(k); dy(k) = (y(k + 1) - y(k))/h(k); 

% Boundary condition 
if KC <= 1 %1st derivatives specified 
A(1,1:2) - [2*h(1) h(1)]; b(1) = 3*(dy(1) - dyO); %Eq.(3.5.5a) 

A(N + 1,N:N + 1) = [h(N) 2*h(N)]; b(N + 1) = 3*(dyN - dy(N));%Eq.(3.5.5b) 
elseif KC == 2 %2nd derivatives specified 

A(1 ,1) = 2; b(1) = dyO; A(N + 1,N+1) = 2; b(N + 1) = dyN; %Eq.(3.5.6) 
else %2nd derivatives extrapolated 
A( 1,1:3) = [h(2) - h(1) - h(2) h(1)]; %Eq.(3.5.7) 

A(N + 1,N-1:N + 1) = [h(N) - h(N)-h(N - 1) h(N - 1)]; 
end 

for m = 2:N %Eq.(3.5.8) 

A(m,m - 1:m + 1) - [h(m - 1) 2*(h(m - 1) + h(m)) h(m)]; 
b(m) = 3*(dy(m) - dy(m - 1)); 

S(: ,3) = A\b; 

% Cubic spline coefficients 

S(m,4) = (S(m+1,3)-S(m,3))/3/h(m); %Eq.(3.5.9) 

S(m,2) = dy(m) -h(m)/3*(S(m + 1,3)+2*S(m,3)); 

S(m,1) = y(m); 
end 

S = S(1:N, 4:-1:1); %descending order 

pp = mkpp(x,S); %make piecewise polynomial 

yi = ppval(pp,xi); %values of piecewise polynomial ftn 


(cf) See Problem 1.11 for the usages of the MATLAB routines mkpp() and ppval(). 

Example 3.3. Cubic Spline. Consider the problem of finding the cubic spline 
interpolation for the N + 1 = 4 data points 

{(0,0), (1,1), (2,4), (3, 5)} (E3.3.1) 

subject to the boundary condition 

Sg(x 0 ) = Sg(0) = S 0 , i = 2, s' n (x n ) = ti 3 (3) = h 3A = 2 (E3.3.2) 


With the subinterval widths on the a;-axis and the first divided differences as 
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, y\-yo , , yi-y\ „ , ys-yi 1 ™ 

dy 0 = —--= 1. dyi = —--= 3, d_y 2 = — - -= 1 (E3.3.3) 

h 0 hi h 2 

we write Eq. (3.5.8) as 


'2100" 

^0,2 


'3(dyo-So,iV 


'-3" 

14 10 

Sl,2 


3(dyi - dy 0 ) 


6 

0 14 1 

$2,2 


3{dy 2 - dyi) 


-6 

0 0 12 

_$3,2_ 


_3(S X i-dyi)_ 


3 


Then we solve this equation to get 

S 0 ,2 = -3, Si, 2 = 3, S 2 , 2 = -3, S X2 = 3 (E3.3.5) 


and substitute this into Eq. (3.5.9) to obtain 


So,o 

50.1 

51.1 


$ 2,1 

$ 0,3 


$1,3 

S2.3 


$2,0 = 4 


(E3.3.6) 


dy i - y (S 2 , 2 + 2 S U2 ) = 3 - i(-3 + 2 x 3) = 2 (E3.3.7b) 
dy 2 - y (S 3 ,2 + 2S 2i2 ) = 1 - 1(3 + 2 x (-3)) = 2 (E3.3.7c) 


5i. 2 - S 0 ,2 _ 3 - (-3) 
3h 0 3 

$2,2 ~ $1,2 _ -3 ~ 3 _ 
3ft! 3 

^3,2 ~ ^2,2 _ 3 - (-3) 

3 h 2 3 


(E3.3.8a) 

(E3.3.8b) 

(E3.3.8c) 


%do_csplines.m 

KC = 1; dyO = 2; dyN = 2; % with specified 1st derivatives on boundary 
X = [0 1 2 3]; y = [0 1 4 5] ; 

xi = x(1) +[0:200] * (x(end) -x(1)) /200; ^.intermediate points 
[yi,S] = cspline(x,y,xi,KC,dyO,dyN); S %cubic spline interpolation 
elf, plot(x,y, 1 ko',xi,yi, 1 k: 1 ) 

yi = spline(x,[dyO y dyN],xi); %for comparison with MATLAB built-in ftn 

yi = spline(x,y,xi); %for comparison with MATLAB built-in ftn 
pause, plot(x,y,'bo 1 ,xi,yi,'b') 

KC = 3; [yi,S] = cspline(x,y,xi,KC);%with the 2nd derivatives extrapolated 
pause, plot(x,y, 1 ko',xi,yi,'k') 
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Finally, we can write the cubic spline equations collectively from (SO) as 
s 0 (x) = So,3(x — x 0 ) 3 + So,2(x — xo) 2 + So,i(x — Xo) + S 0 ,o 
= 2x 3 -3x 2 + 2x + 0 

si(.r) = % 3 (x — x\) 3 + Si^ix — x\) 2 + 5i,i(a: — x\) + 5i, o 
= -2(x - l) 3 + 3(jc - l) 2 + 2(x - 1) + 1 
■V 2 W = 52,3 (.V — X2) 3 + S 2 . 2 (x — X2) 2 + S 2 ,1(X — X2) + 52,0 
= 2(x - 2) 3 - 3(x - 2) 2 + 2(x - 1) + 4 

We make and run the program “do_csplines.m”, which uses the routine 
“csplineO” to compute the cubic spline coefficients { 5*,3, 5*. 2 , 5*,i, 5*,o, k = 
0 : N — 1} and obtain the value(s) of the cubic spline function for xi (i.e., the 
interpolation over xi) and then plots the result as depicted in Fig. 3.7. We also 
compare this result with that obtained by using the MATLAB built-in function 
“spline (x,y,xi)”, which works with the boundary condition of type (i) for the 
second input argument given as [dyO y dyN], and with the boundary condition 
of type (iii) for the same lengths of x and y. 

»do_csplines %cubic spline 

S = 2.0000 -3.0000 2.0000 0 

-2.0000 3.0000 2.00001.0000 

2.0000 -3.0000 2.0000 4.0000 

3.6 HERMITE INTERPOLATING POLYNOMIAL 

In some cases, we need to find the polynomial function that not only passes 
through the given points, but also has the specified derivatives at every data 
point. We call such a polynomial the Hermite interpolating polynomial or the 
osculating polynomial. 
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For simplicity, we consider a third-order polynomial 

h(x) = H 3 x 3 + H 2 x 2 + H lX + H 0 (3.6.1) 

matching just two points (% 0 , >’ 0 ), {x\, >’| ) and having the specified first derivatives 
y' 0 , y[ at the points. We can obtain the four coefficients H 3 , H 2 . H\, H 0 by solving 


h(x o) = H 3 x q + H 2 Xq + HiXq + Hq = yo 
h(x 1 ) = H 3 x\ + H 2 x\ + H\X\ + Hq = yi 
h'(x 0 ) = 3 // 3 X 0 + 2 H 2 X 0 + Hi = y' 0 


h\x 1 ) = 3H 3 xl + 2 H 2 xi +Hi= y[ 

As an alternative, we approximate the specified derivatives at the data points by 
their differences 

, _ h(x 0 + g) - h(x 0 ) _ y 2 - yo , _ h(x Q - h{x 1 - £) _ y\ - y 3 

■^° e s ’ ^ s s 

(3.6.3) 


and find the Lagrange/Newton polynomial matching the four points 


(* 0 , yo), (x 2 = xo + e,y 2 = yo + }&), (x 3 =xi-s,y 3 = y l - y[s), (x u yi) 

(3.6.4) 

The MATLAB routine “hermit ()” constructs Eq. (3.6.2) and solves it to get 
the Hermite interpolating polynomial coefficients for a single interval given the 
two end points and the derivatives at them as the input arguments. The next 
routine “hermits ()” uses “hermit ()” to get the Hermite coefficients for a set 
of multiple subintervals. 


function H = hermit(x0,y0,dy0,x1,y1,dy1) 

A = [x0~3 x0~2 xO 1; xl"S xl"2 

3*x(T2 2*x0 1 0; 3*x1 "2 2*x1 

b = [yO yi dyO dyl] 1 ; %Eq.(3.6-2) 

H = (A\b) 1 ; 

xl 1 ; 

1 0]; 

function 

= hermits(x,y,dy) 


% finds Hermite interpolating polynomials for multiple subintervals 

%Input : 

x,y],dy - points and derivatives 

at the points 

%0utput: 

= coefficients of cubic Hermite 

interpolating polynomials 

for n = 1 

length(x)-1 


H(n, :) 

= hermit(0,y(n),dy(n),x(n + 1)-x 

n),y(n + 1),dy(n + 1)); 

end 




Example 3.4. Hermite Interpolating Polynomial. Consider the problem of find¬ 
ing the polynomial interpolation for the N + 1 = 4 data points 


{(0, 0), (1, 1), (2, 4), (3, 5)} 


(E3.4.1) 
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subject to the conditions 

h' 0 (x 0 ) = h' 0 (0) = 2 , h[(l) = 0, h' 2 ( 2) = 0 , h' N (x N ) = h' 3 (3) = 2 

(E3.4.2) 

For this problem, we only have to type the following statements in the MAT- 
LAB command window. 

»x = [0 1 2 3]; y = [0 1 4 5]; dy = [2 0 0 2]; xi = [0:0.01:3]; 

»H = hermits(x,y,dy); yi = ppval(mkpp(x,H), xi); 

3.7 TWO-DIMENSIONAL INTERPOLATION 

In this section we deal with only the simplest way of two-dimensional 
interpolation—that is, a generalization of piecewise linear interpolation called 
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Figure 3.8 A two-dimensional interpolation using Zi = interp2() on the grid points 
[Xi,Yi] generated by the meshgrid() command. 
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the bilinear interpolation. The bilinear interpolation for a point (x, y ) on the rect¬ 
angular sub-region having (x m -\, y„~\) and (x m , y„) as its left-upper/right-lower 
corner points is described by the following formula. 


z(x, y„-i) 


z(x, y n ) 
z(x, y) 


Zm—l,n T Zm,n 

y n -y , . , y-y n - 


(3.7.1a) 

(3.7.1b) 


■z(x, y n -i) + - z(x, y„) 

'>n ~ y n -i 

{( x m -x)(y n - y)z m -1 1 


y n - y n -l }'n - )’n- 

1 


(x m -x m -i)(y n — y„-{) 

+ (x- x m _i )(y„ - y)Zm,n -1 + (Xm ~ x)(j - y n -l)z m -l,n 
+ (x- x m _i)(y - y„-i)z m ,„} for x m < x < x m , y„_i < y < y„ 

(3.7.2) 


function Zi = intrp2(x,y,Z,xi,yi) 

%To interpolate Z(x,y) on (xi,yi) 

M = length(x); N = length(y); 

Mi = length(xi); Ni = length(yi); 
for mi = 1:Mi 
for ni = 1:Ni 
for m = 2: M 
for n = 2: N 
breakl = 0; 

if xi(mi) <= x(m) & yi(ni) <= y(n) 

tmp = (x(m)-xi(mi))*(y(n)-yi(ni))*Z(n - 1,m - 1)... 

+(xi(mi) - x(m-1))*(y(n) - yi(ni))*Z(n - 1,m)... 
+(x(m) - xi(mi))*(yi(ni) - y(n - 1))*Z(n,m - 1)... 

+ (xi(m) - x(m-1))*(yi(ni) - y(n-1))*Z(n,m); 

Zi(ni,mi) = tmp/(x(m) - x(m-1))/(y(n) - y(n-1)); %Eq.(3.7.2) 
breakl = 1; 
end 

if breakl > 0 break, end 
end 

if breakl > 0 break, end 
end 
end 
end 


This formula is cast into the MATLAB routine “int rp2 () ”, which is so named 
in order to distinguish it from the MATLAB built-in routine “interp2( )”. Note 
that in reference to Fig. 3.8, the given values of data at grid points (x(m) ,y(n)) 
and the interpolated values for intermediate points (xi(m),yi(n)) are stored in 
Z ( n , m ) and Zi ( n , m ), respectively. 




CURVE FITTING 143 


%do_interp2.m 

% 2-dimensional interpolation for Ex 3.5 
xi = -2:0.1:2; yi = -2:0.1:2; 

[Xi,Yi] = meshgrid(xi,yi); 

ZO = Xi."2 + Yi."2; %(E3.5.1) 
subplot(131), mesh(Xi,Yi,ZO) 
x = -2:0.5:2; y = -2:0.5:2; 

[X,Y] = meshgrid(x,y); 

Z = X.~2 + Y. *2; 
subplot(l32), mesh(X,Y,Z) 

Zi = interp2(x,y,Z,Xi,Yi); %built-in routine 
subplot(133), mesh(xi,yi,Zi) 

Zi = intrp2(x,y,Z,xi,yi); %our own routine 
pause, mesh(xi,yi,Zi) 
norm(Z0 - Zi)/norm(Z0) 


Example 3.5. Two-Dimensional Bilinear Interpolation. We consider interpolat¬ 
ing the sample values of a function 

f(x,y)=x 2 + y 2 (E3.5.1) 

for the 5x5 grid over the 21 x 21 grid on the domain D = {(x, y)\ — 2 < x < 
2, —2 < y < 2}. 

We make the MATLAB program “do_interp2.m”, which uses the routine 
“intrp2()” to do this job, compares its function with that of the MATLAB 
built-in routine “interp2()”, and computes a kind of relative error to estimate 
how close the interpolated values are to the original values. The graphic results 
of running this program are depicted in Fig. 3.9, which shows that we obtained 
a reasonable approximation with the error of 2.6% from less than 1/16 of the 
original data. It is implied that the sampling may be a simple data compression 
method, as long as the interpolated data are little impaired. 

3.8 CURVE FITTING 

When many sample data pairs {fe, y/t), k = 0 : M] are available, we often need 
to grasp the relationship between the two variables or to describe the trend of the 



(a) True function (b) The function over (c) Bilinear interpolation 

sample grid 


Figure 3.9 Two-dimensional interpolation (Example 3.5). 
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data, hopefully in a form of function y = f(x). But, as mentioned in Remark 3.1, 
the polynomial approach meets with the polynomial wiggle and/or Runge phe¬ 
nomenon, which makes it not attractive for approximation purpose. Although the 
cubic spline approach may be a roundabout toward the smoothness as explained 
in Section 3.5, it has too many parameters and so does not seem to be an effi¬ 
cient way of describing the relationship or the trend, since every subinterval 
needs four coefficients. What other choices do we have? Noting that many data 
are susceptible to some error, we don’t have to try to find a function passing 
exactly through every point. Instead of pursuing the exact matching at every data 
point, we look for an approximate function (not necessarily a polynomial) that 
describes the data points as a whole with the smallest error in some sense, which 
is called the curve fitting. 

As a reasonable means, we consider the least-squares (LS) approach to min¬ 
imizing the sum of squared errors, where the error is described by the vertical 
distance to the curve from the data points. We will look over various types of 
fitting functions in this section. 

3.8.1 Straight Line Fit: A Polynomial Function of First Degree 

If there is some theoretical basis on which we believe the relationship between 
the two variables to be 

e lX +e 0 = y (3.8.1) 

we should set up the following system of equations from the collection of many 
experimental data: 


0i*i + 0o = yi 

61X2 + 00 = ^2 


A0 = y with A = 


0\x M + 0 0 = y M 

xi 1 
X 2 1 


0i 

' . e °. ’ 


Noting that this apparently corresponds to the overdetermined case mentioned 
in Section 2.1.3, we resort to the least-squares (LS) solution (2.1.10) 




[A T A]~ l A T y 


which minimizes the objective function 


J = ||e|| 2 = || A6 - y|| 2 = [A0 - y] r [A0 - y] 


(3.8.4) 
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Sometimes we have the information about the error bounds of the data, and it is 
reasonable to differentiate the data by weighing more/less each one according to 
its accuracy/reliability. This policy can be implemented by the weighted least- 
squares (WLS) solution 

0 W = = [A T WA]~ l A T W y (3.8.5) 

L^wo J 

which minimizes the weighted objective function 

J w = [AO - y ] t W[A0 - y] (3.8.6) 


If the weighting matrix is W = V 1 = R T R 1 , then we can write the WLS 
solution (3.8.5) as 



where 


= [(R-'AfiR-'AK-'iR-'AfR-'y = [A^r^y* 

(3.8.7) 


A r = R~ l A, y R = R 'y, W = V' 1 = R~ T R~' (3.8.8) 


One may use the MATLAB built-in routine “lscov(A,y,V)” to obtain this 
WLS solution. 


3.8.2 Polynomial Curve Fit: A Polynomial Function of Higher Degree 

If there is no reason to limit the degree of fitting polynomial to one, then we may 
increase the degree of fitting polynomial to, say, N in expectation of decreasing 
the error. Still, we can use Eq. (3.8.4) or (3.8.6), but with different definitions of 
A and 0 as 

-X? ■ *1 1 

A = X 2 ■ *2 1 

- X M ■ XM 1 



On 

0 = 

Ox 


Oo 


The MATLAB routine “polyfitsO” performs the WLS or LS scheme to 
find the coefficients of a polynomial fitting a given set of data points, depending 
on whether or not a vector (r) having the diagonal elements of the weighting 
matrix W is given as the fourth or fifth input argument. Note that in the case of 
a diagonal weighting matrix W, the WLS solution conforms to the LS solution 
with each row of the information matrix A and the data vector y multiplied by 
the corresponding element of the weighting matrix W. Let us see the following 
examples for its usage: 
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function [th,err,yi] = polyfits(x,y,N,xi,r) 

%x,y : the row vectors of data pairs 
%N : the order of polynomial(>=0) 

%r : reverse weighting factor array of the same dimension as y 
M = length(x); x = x(:)5 y = y(:); %Make all column vectors 
if nargin == 4 

if length(xi) == M, r = xi; xi = x; %With input argument (x,y,N,r) 
else r = 1; %With input argument (x,y,N,xi) 

elseif nargin == 3, xi = x; r = 1; % With input argument (x,y,N) 

A(:,N + 1) = ones(M,1); 

for n = N:-1:1, A(:,n) = A(:,n+1).*x; end %Eq.(3.8.9) 
if length(r) == M 

for m = 1:M, A(m,:) = A(m,:)/r(m); y(m) = y(m)/r(m); end %Eq.(3.8.8) 
th = (A\y) 1 %Eq.(3.8.3) or (3.8.7) 

ye = polyval(th,x); err = norm(y - ye)/norm(y); %estimated y values, error 
yi = polyval(th,xi); 


%do_polyfit 
load xyl.dat 

x = xy1(:,1); y = xyl(:,2); 

[x,i] = sort(x); y = y(i); %sort the data for plotting 

xi = min(x)+[0:100]/100*(max(x) - min(x)); %intermediate points 
for i = 1:4 

[th,err,yi] = polyfits(x,y,2*i - 1,xi); err %LS 
subplot(220+i) 
plot(x,y, 1 k*',xi,yi,'b: 1 ) 
end 


%xy1.dat 
-3.0 -0.2774 

-2.0 0.8958 

-1.0 -1.5651 

0.0 3.4565 

1.0 3.0601 

2.0 4.8568 

3.0 3.8982 


Example 3.6. Polynomial Curve Fit by LS (Least Squares). Suppose we have 
an ASCII data file “xyl . dat” containing a set of data pairs {(jc*, y’k), k = 0:6} in 
two columns and we must fit these data into polynomials of degree 1, 3, 5, and 7. 


* 

-3 

-2 

-1 

0 

1 

2 

3 

7 

-0.2774 

0.8958 

-1.5651 

3.4565 

3.0601 

4.8568 

3.8982 


We make the MATLAB program “do_polyfit .m”, which uses the routine 
“polyfitsO” to do this job and plot the results together with the given data 
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(c) Polynomial of degree 5 (d) Polynomial of degree 7 

Figure 3.10 Polynomial curve fitting by the LS (Least-Squares) method. 


points as depicted in Fig. 3.10. We can observe the polynomial wiggle that the 
oscillation of the fitting curve between the data points becomes more pronounced 
with higher degree. 

Example 3.7. Curve Fitting by WLS (Weighted Least Squares). Most experimen¬ 
tal data have some absolute and/or relative error bounds that are not uniform for 
all data. If we know the error bounds for each data, we may give each data a 
weight inversely proportional to the size of its error bound when extracting valu¬ 
able information from the data. The WLS solution (3.8.7) enables us to reflect such 
a weighting strategy on estimating data trends. Consider the following two cases. 

(a) Suppose there are two gauges A and B with the same function, but dif¬ 
ferent absolute error bounds ±0.2 and ±1.0, respectively. We used them 
to get the input-output data pair ( x m ,y m ) as 

{(1, 0.0831), (3, 0.9290), (5, 2.4932), (7, 4.9292), (9, 7.9605)} 
from gauge A 

{(2, 0.9536), (4, 2.4836), (6, 3.4173), (8, 6.3903), (10, 10.2443)} 
from gauge B 

Let the fitting function be a second-degree polynomial function 


y = a 2 x~ ± a\x ± a 0 


(E3.7.1) 
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Figure 3.11 LS curve fitting and WLS curve fitting for Example 3.7. 

To find the parameters ai, a\, and ao, we write the MATLAB program 
“do_wlse1. m”, which uses the routine “polyf its ()” twice, once without 
weighting coefficients and once with weighting coefficients. The results 
are depicted in Fig. 3.11a, which shows that the WLS curve fitting tries 
to be closer to the data points with smaller error bound, while the LS 
curve fitting weights all data points equally, which may result in larger 
deviations from data points with small error bounds. 

(b) Suppose we use one gauge that has relative error bound ±40[%] for 
measuring the output y for the input values x = [1,3,5,..., 19] and so 
the size of error bound of each output data is proportional to the magnitude 
of the output. We used it to get the input-output data pair ( x m ,y m ) as 

{(1,4.7334), (3, 2.1873), (5, 3.0067), (7, 1.4273), (9, 1.7787) 

(11, 1.2301), (13, 1.6052), (15, 1.5353), (17, 1.3985), (19, 2.0211)} 

Let the fitting function be an exponential function 

y = ax b (E3.7.2) 

To find the parameters a and b, we make the MATLAB program 
“do_wlse2. m”, which uses the routine “curve_f it () ” without the weight¬ 
ing coefficients one time and with the weighting coefficients another time. 
The results depicted in Fig. 3.1 lb shows that the WLS curve fitting tries to 
get closer to the data points with smaller |y |, while the LS curve fitting pays 
equal respect to all data points, which may result in larger deviation from 
data points with small |y \. Note that the MATLAB routine “cu rve_f it () ” 
appears in Problem 3.11, which implements all of the schemes listed in 
Table 3.5 with the LS/WLS solution. 
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(cf) Note that the objective of the WLS scheme is to put greater emphasis on more 
reliable data. 


%do_wlse1 for Ex.3.7 
clear, elf 

x= [135792468 10]; %input data 
y = [0.0831 0.9290 2.4932 4.9292 7.9605 ... 

0.9536 2.4836 3.4173 6.3903 10.2443]; %OUtput data 
eb = [0.2*ones(5,1); ones(5,1)]; %error bound for each y 
[x,i] = sort(x); y = y(i); eb = eb(i); %sort the data for plotting 
errorbar(x,y,eb,':'), hold on 
N = 2; %the degree of the approximate polynomial 
xi = [0:100]/10; %interpolation points 
[thl,errl,yl] = polyfits(x,y,N,xi); 

[thwl,errwl,ywl] = polyfits(x,y,N,xi,eb); 
plot(xi,yl,'b', xi,ywl,'r') 

%KC = 0; thlc = curve_fit(x,y,KC,N,xi); %for cross-check 
%thwlc = curve_fit(x,y,KC,N,xi,eb); 


%do_wlse2 
clear, elf 

x = [1:2:20]; Nx = length(x); %changing input 

xi = [1:200]/10; %interpolation points 

eb = 0.4*ones(size(x)); %error bound for each y 
y = [4.7334 2.1873 3.0067 1.4273 1.7787 1.2301 1.6052 1.5353 ... 

1.3985 2.0211]; 

[x,i] = sort(x); y = y(i); eb = eb(i); %sort the data for plotting 
eby = y.*eb; %our estimation of error bounds 
KC = 6; [thlc,err,yl] = curve_fit(x,y,KC,0,xi); 

[thwlc,err,ywl] = curve_fit(x,y,KC,0,xi,eby); 
errorbar(x,y,eby), hold on 
plot(xi,yl,'b', xi,ywl,'r') 


3.8.3 Exponential Curve Fit and Other Functions 

Why don’t we use functions other than the polynomial function as a candidate 
for fitting functions? There is no reason why we have to stick to the polynomial 
function, as illustrated in Example 3.7(b). In this section, we consider the case 
in which the data distribution or the theoretical background behind the data tells 
us that it is appropriate to fit the data into some nonpolynomial function. 
Suppose it is desired to fit the data into the following exponential function. 

ce ax =y (3.8.10) 

Taking the natural logarithm of both sides, we linearize this as 


a x + In c = In y 


(3.8.11) 
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Table 3.5 Linearization of Nonlinear Functions by Parameter/Data Transformation 




Variable Substitution/ 

Function to Fit 

Linearized Function 

Parameter Restoration 

(!) y = ^ +b 

y = a- + b -fc y = ax' +b 

, 1 

b 

(2) y = -7— 

i = ±x+^ y ' = a 'x + b' 

' 1 V b 1 

y y’ a a 1 ’ a' 

(3) y = a b* 

In y = (In b)x + In a 

-+y’ = a'x + b' 

y = ln y,a = e b ' ,b = e a ' 

(4) y = b e ax 

In y = ax + In b -»• / = ax + b' 

/ = In >•, b = e b ' 

(5 ) y = C-b e-°* 

ln(C — y) = —<«• + In b 

y' = ln(C - y) 


y' = a'x + b’ 

a = -a’, b = e v 

(6) y = a x b 

lny = b( lnx) + In a 

y' = In y. x' = lnx 


y' = a'x 1 + b’ 

a = e v , b = a' 

(7) y = ax e bx 

In y — In x = bx + In a 

f = in iy/x) 


->• >•' = a’x + V 

a = e b \b = a' 

(8) y = T+T? 7 

(a(0, b)0, C = y(oo)) 

In — 1^ = ax + Inb 

-»• / = ax + V 


(9) y = a\nx+b 

y = ax' + b 

x' = !nx 


so that the LS algorithm (3.8.3) can be applied to estimate the parameters a and 
Inc based on the data pairs {(x^, lny^), k = 0 : M}. 

Like this, there are many other nonlinear relations that can be linearized to fit 
the LS algorithm, as listed in Table 3.5. This makes us believe in the extensive 
applicability of the LS algorithm. If you are interested in making a MATLAB 
routine that implements what are listed in this table, see Problem 3.11, which lets 
you try the MATLAB built-in function “lsqcurvefit(f ,th0,x,y)” that enables 
one to use any type of function (f) for curve fitting. 


3.9 FOURIER TRANSFORM 

Most signals existent in this world contain various frequency components, where 
rapidly/slowly changing one contains high/low-frequency components. Fourier 
series/transform is a mathematical tool that can be used to analyze the fre¬ 
quency characteristic of periodic/aperiodic signals. There are four similar defini¬ 
tions of Fourier series/transform, namely, continuous-time Fourier series (CtFS), 
continuous-time Fourier transform (CtFT), discrete-time Fourier transform 
(DtFT), and discrete Fourier series/transform (DFS/DFT). Among these tools, 
DFT can easily and efficiently be programmed in computer languages and that’s 
why we deal with just DFT in this section. 
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Suppose a sequence of data { x[n ] = x(nT), n = 0 : M — 1 }(T : the sampling 
period) is obtained by sampling a continuous-time/space signal once every T 
seconds. The A(> M)-point DFT/IDFT (inverse DFT) pair is defined as 

N -1 

X(k) = Y x\n]e- J2nnk/N , k = 0 : A- I (3.9.1a) 
1 N ~ l 

x[n] = X(k)e J2nnk/N , n = 0:N-l (3.9.1b) 

A ' 

k =0 

Remark 3.3. DFS/DFT (Discrete Fourier Series/Transform) 

0. Note that the indices of the DFT/IDFT sequences appearing in MATLAB 
range from 1 to A. 

1. Generally, the DFT coefficient X(k ) is complex-valued and denotes the 
magnitude and phase of the signal component having the digital frequency 

= kQ .o = 27rk/A[rad], which corresponds to the analog frequency co^ = 
kco 0 = kQ(,/T = 2nk/NT[rad/s\. We call Q 0 = 2n/N and o> 0 = 2 k/NT 
(A represents the size of DFT) the digital/analog fundamental or resolution 
frequency, since it is the minimum digital/analog frequency difference that 
can be distinguished by the A-point DFT. 

2. The DFS and the DFT are essentially the same, but different in the range 
of time/frequency interval. More specifically, a signal x[n] and its DFT 
X{k) are of finite duration over the time/frequency range {0 < n < A — 1} 
and {0 < k < A — 1}, respectively, while a signal x[n\ (to be analyzed by 
DFS) and its DFS X ( k) are periodic with the period A over the whole set 
of integers. 

3. FFT (fast Fourier transform) means the computationally efficient algorithm 
developed by exploiting the periodicity and symmetry in the multiplying 
factor e ll7tnk/N to reduce the number of complex number multiplications 
from A 2 to (A/2) log 2 A (A represents the size of DFT). The MATLAB 
built-in functions “fft ()”/“if ft ()” implement the FFT/IFFT algorithm for 
the data of length A = 2 l (/ represents a nonnegative integer). If the length 
Mot the original data sequence is not a power of 2, it can be extended by 
padding the tail part of the sequence with zeros, which is called zero-padding. 

3.9.1 FFT Versus DFT 

As mentioned in item 3 of Remark 3.3, FFT/IFFT (inverse FFT) is the compu¬ 
tationally efficient algorithm for computing the DFT/IDFT and is fabricated into 
the MATLAB functions “fft()”/“ifft()”. In order to practice the use of the 
MATLAB functions and realize the computational advantage of FFT/IFFT over 
DFT/IDFT, we make the MATLAB program “compare_dft_fft .m”. Readers are 
recommended to run this program and compare the execution times consumed by 
the 1024-point DFT/IDFT computation and its FFT/IFFT scheme, seeing that the 


DFT: 

IDFT: 
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resulting spectra are exactly the same and thus are overlapped onto each other 
as depicted in Fig. 3.12. 


%compare_DFT_FFT 
clear, elf 

N = 2*10; n = [0:N - 1]; 

x = cos(2*pi*200/N*n)+ 0.5*sin(2*pi*300/N*n); 
tic 

for k = 0:N - 1, X(k+1) = x*exp(-j*2*pi*k*n/N).'; end %DFT 
k = [ 0: N - 1 ]; 

for n = 0:N - 1, xr(n + 1) = X*exp(j*2*pi*k*n/N).'; end %IDFT 
time_dft = toe %number of floating-point operations 
plot(k,abs(X)), pause, hold on 
tic 

XI = fft(x); %FFT 
xrl = ifft(XI); %IFFT 

time_fft = toe %number of floating-point operations 
elf, plot(k,abs(X1), 1 r 1 ) %magnitude spectrum in Fig. 3.12 


3.9.2 Physical Meaning of DFT 

In order to understand the physical meaning of FFT, we make the MATLAB 
program “do_fft” and run it to get Fig. 3.13, which shows the magnitude spectra 
of the sampled data taken every T seconds from a two-tone analog signal 

x(t ) = sin(1.57n) + 0.5cos(37Tt) (3.9.2) 

Readers are recommended to complete the part of this program to get Fig. 3.13c,d 
and run the program to see the plotting results (see Problem 3.16). 

What information do the four spectra for the same analog signal x(t) carry? 
The magnitude of X a (k) (Fig. 3.13a) is large at k = 2 and 5, each corresponding 
to ka> 0 = 2nk/NT = 27rk/3.2 = 1.257T Fa 1.57T and 3.1257T Fa 3 n. The magni¬ 
tude of X b (k) (Fig. 3.13b) is also large at k = 2 and 5, each corresponding to 
k(o 0 = 1.257T « 1.5tt and 3.1257T «a 3 n. The magnitude of X c (k) (Fig. 3.13c) is 


600 -1-1-1-1-1-1-1-1-1-q 

400 “ digital frequency 

n 200 = 2.n x 200 /N [rad] 

X(k) I t ? 

200 -1 

I a 300 = 2nx 300/A/ [rad] 

°0T0T^00l0^0^00M^72r82n0nM023 

Figure 3.12 The DFT(FFT) {X(k),k = 0 : N- 1) of x[A(] = cos(2tt x 200n/A/) + 0.5 sin 
(2jt x 300n/N) for n = 0 : N - 1 (A/ = 2 10 = 1024). 



digital frequency 



n 200 = 2Ti x 200 /N [rad] 


X(k) 

/ T T 

_ 


1 a 300 = 2nx 300/A/ [rad] 
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do_fft (to get Fig. 3.13) 
tear, elf 

1 = 1.5*pi; w2=3*pi; %two tones 
= 32; n = [0:N - 1]; T = 0.1; %sampling period 
= n*T; xan = sin(w1*t) + 0.5*sin(w2*t); 
ubplot(421), stem(t,xan, 1 .') 

= 0:N - 1; Xa = fft(xan); 

scrp=norm(xan-real(ifft(Xa))) %x[n] reconstructible from IFFT{X(k)} 

ubplot(423), stem(k,abs(Xa), 1 .') 

upsampling 

=64; n = [0:N - 1]; T = 0.05; %sampling period 
= n*T; xbn = sin(w1*t)+ 0.5*sin(w2*t); 
ubplot(422), stem(t,xbn, 1 .') 

= 0:N - 1; Xb = fft(xbn); 
ubplot(424), stem(k,abs(Xb), 1 .') 
zero-padding 

= 64; n = [0:N-1]; T = 0.1; %sampling period 
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large at k = 4,5 and 9,10, and they can be alleged to represent two tones of too = 
2nk/NT = 2nk/6A 1.257T ~ 1.56257T and 2.81257T ~ 3.1257T. The magni¬ 
tude of Xa(k) (Fig. 3.13d) is also large at k = 5 and 10, each corresponding to 
to 0 = 1.5625 n « 1.5 n and 3.125 n « 3 n. 

It is strange and interesting that we have many different DFT spectra for the same 
analog signal, depending on the DFT size, the sampling period, the whole interval, 
and zero-padding. Compared with spectrum (a), spectrum (b) obtained by decreas¬ 
ing the sampling period T from 0.1s to 0.05s has wider analog frequency range 
[0,27 t/7j], but the same analog resolution frequency is coo = S2 0 (% = 2n/NbTb = 
n/\.6 = 2Tt/N a T a \ consequently, it does not present us with any new information 
over (a) for all increased number of data points. The shorter sampling period may be 
helpful in case the analog signal has some spectral contents of frequency higher than 
n/T a . The spectrum (c) obtained by zero-padding has a better-looking, smoother 
shape, but the vividness is not much improved compared with (a) or (b), since the 
zeros essentially have no valuable information in the time domain. In contrast with 
(b) and (c), spectrum (d) obtained by extending the whole time interval shows us 
the spectral information more distinctly. 

Note the following things: 

• Zero-padding in the time domain yields the interpolation (smoothing) effect 
in the frequency domain and vice versa, which will be made use of for data 
smoothing in the next section (see Problem 3.19). 

• If a signal is of finite duration and has the value of zeros outside its domain 
on the time axis, its spectrum is not discrete, but continuous along the 
frequency axis, while the spectrum of a periodic signal is discrete as can be 
seen in Fig. 3.12 or 3.13. 

• The DFT values X (0) and X ( N/2) represent the spectra of the dc component 
(£~2 0 = 0) and the virtually highest digital frequency components (&n /2 = 
N/2 x 2n/N = n [rad]), respectively. 

Here, we have something questionable. The DFT spectrum depicted in Fig. 3.12 
shows clearly the digital frequency components Q 2 oo = 2n x 200 /N and f2 300 = 
2n x 300//VLradJ(/V = 2 10 = 1024) contained in the discrete-time signal 

x[n] = cos(2tt x 200 n/N) + 0.5 sin(2jr x 300 n/N), N = 2 10 = 1024 

(3.9.3) 

and so we can find the analog frequency components co^ = Q^/T as long as 
the sampling period T is known, while the DFT spectra depicted in Fig. 3.13 
are so unclear that we cannot discern even the prominent frequency contents. 
What’s wrong with these spectra? It is never a ‘right-or-wrong’ problem. The 
only difference is that the digital frequencies contained in the discrete-time signal 
described by Eq. (3.9.3) are multiples of the fundamental frequency £2 0 = 2 n/N, 
but the analog frequencies contained in the continuous-time signal described by 
Eq. (3.9.2) are not multiples of the fundamental frequency ojq = 2n/NT ; in 
other words, the whole time interval [0, NT) is not a multiple of the period of 
each frequency to be detected. The phenomenon whereby the spectrum becomes 
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blurred like this is said to be the ‘leakage problem’. The leakage problem occurs 
in most cases because we cannot determine the length of the whole time interval 
in such a way that it is a multiple of the period of the signal as long as we don’t 
know in advance the frequency contents of the signal. If we knew the frequency 
contents of a signal, why do we bother to find its spectrum that is already known? 
As a measure to alleviate the leakage problem, there is a windowing technique 
[0-1, Section 11.2], Interested readers can see Problem 3.18. 

Also note that the periodicity with period ATthe DFT size) of the DFT 
sequence X{k) as well as x\n\, as can be manifested by substituting k + mN 
(m represents any integer) for k in Eq. (3.9.1a) and also substituting n + mN 
for n in Eq. (3.9.1b). A real-world example reminding us of the periodicity of 
DFT spectrum is the so-called stroboscopic effect whereby the wheel of a car¬ 
riage driven by a horse in the scene of a western movie looks like spinning at 
lower speed than its real speed or even in the reverse direction. The periodicity 
of x[n\ is surprising, because we cannot imagine that every discrete-time signal 
is periodic with the period of N, which is the variable size of the DFT to be 
determined by us. As a matter of fact, the ‘weird’ periodicity of x [n\ can be 
regarded as a kind of cost that we have to pay for computing the sampled DFT 
spectrum instead of the continuous spectrum X(oo) for a continuous-time signal 
x(t), which is originally defined as 


X((o)= / x(t)e- jw, dt (3.9.4) 


Actually, this is to blame for the blurred spectra of the two-tone signal depicted 
in Fig. 3.13. 


3.9.3 Interpolation by Using DFS 


function [xi,Xi] = interpolation_by_DFS(T,x,Ws,ti) 

%T : sampling interval (sample period) 

%x : discrete-time sequence 

%Ws: normalized stop frequency (1.0=pi[rad]) 

%ti: interpolation time range or # of divisions for T 
if nargin <4, ti = 5; end 
if nargin <3 | Ws > 1, Ws = 1; end 
N = length(x); 
if length(ti) == 1 

ti = 0:T/ti:(N-1)*T; %subinterval divided by ti 
end 

ks = ceil(Ws*N/2); 

Xi = fft(x); 

Xi(ks + 2:N - ks) = zeros(1,N - 2*ks - 1); %filtered spectrum 
xi = zeros(1,length(ti)); 
for k = 2:N/2 

xi = xi+Xi(k)*exp(j *2*pi*(k - 1)*ti/N/T); 
end 

xi = real(2*xi+Xi(1)+Xi(N/2+1)*cos(pi*ti/T))/N; %Eq.(3.9.5) 
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%interpolate_by_DFS 
clear, elf 

wl = pi; w2 = .5*pi; %two tones 
N = 32; n = [0:N - 1]; T = 0.1; t = n*T; 

x = sin(w1*t)+0.5*sin(w2*t)+(rand(1,N) - 0.5); %0.2*sin(20*t); 
ti = [0:T/5:(N - 1)*T]; 

subplot(411), plot(t,x, 1 k. 1 ) %original data sequence 
title( 1 original sequence and interpolated signal') 

[xi,Xi] = interpolation_by_DFS(T,x,1,ti); 
hold on, plot(ti,xi, 1 r 1 ) %reconstructed signal 
k = [ 0: N - 1 ]; 

subplot(412), stem(k,abs(Xi), 1 k. 1 ) %original spectrum 
title( 1 original spectrum 1 ) 

[xi,Xi] = interpolation_by_DFS(T,x,1/2,ti); 
subplot(413), stem(k,abs(Xi), 1 r. 1 ) %filtered spectrum 
title( 1 filtered spectrum 1 ) 

subplot(414), plot(t,x, 1 k. 1 , ti,xi,'r') %filtered signal 
title('filtered/smoothed signal') 


We can use the DFS/DFT to interpolate a given sequence x[n\ that is supposed 
to have been obtained by sampling some signal at equidistant points (instants). 
The procedure consists of two steps; to take the /V-point FFT X(k) of x[n\ and 
to use the formula 

N \k\<NJ2 
l N/ 2-1 

= — {X(0) + 2 Y, Reti{X(k)e i27tk,/NT } + X(N/2)cos(7tt/T)} (3.9.5) 

k= 1 

This formula is cast into the routine “interpolation_by_dfs”, which makes 
it possible to filter out the high-frequency portion over (Ws-tt, (2-Ws)^) with 
Ws given as the third input argument. The horizontal (time) range over which 
you want to interpolate the sequence can be given as the fourth input argument 
ti. We make the MATLAB program “interpolate_by_df s”, which applies the 
routine to interpolate a set of data obtained by sampling at equidistant points 
along the spatial or temporal axis and run it to get Fig. 3.14. Figure 3.14a shows 
a data sequence x[n\ of length N = 32 and its interpolation (reconstruction) 
x(t) from the 32-point DFS/DFT X(k) (Fig. 3.14b), while Figs. 3.14c and 3.14d 
show the (zero-padded) DFT spectrum X'(k ) with the digital frequency contents 
higher than 7r/2[radJ(/V/4 < k < 3 A/4) removed and a smoothed interpolation 
(fitting curve) x'(t) obtained from X'(k), respectively. This can be viewed as the 
smoothing effect in the time domain by zero-padding in the frequency domain, 
in duality with the smoothing effect in the frequency domain by zero-padding in 
the time domain, which was observed in Fig. 3.13c. 
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(c) The spectrum X'(k) of the filtered signal x'(t) 
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(d) The filtered signal x'(f) 

Figure 3.14 Interpolation/smoothing by using DFS/DFT. 


PROBLEMS 

3.1 Quadratic Interpolation: Lagrange Polynomial and Newton Polynomial 
(a) The second-degree Lagrange polynomial matching the three points 
(x 0 , f 0 ), (x \, /i), and (x 2 , fi) can be written by substituting N = 2 
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into Eq. (3.1.3) as 


2 2 N _ 

h(x) = fmL 2 ,m(x ) = fm f[ (P3-1.1) 

m =0 m= 0 Xm ** 

Check if the zero of the derivative of this polynomial—that is, the root 
of the equation l 2 (x) = 0—is found as 

. _ , (x - xj) + (x- x 2 ) (x - x 2 ) + (x- x 0 ) 

2 * ~~ /0 (xq - jci)(jc 0 - X 2 ) h (x t - x 2 )(xi - *o) 

(x - Vq) + {x- xi) 
h (X 2 -X 0 )(x 2 - Xl ) 

f 0 (2x - .Y| - x 2 )(x 2 -.Vi) + fi(2x — x 2 - x 0 )(x 0 - xi) 

+ f 2 (lx -x 0 - - x 0 ) = 0 

= = M x l - x l) + /l(^ 2 2 - ^ 0 2 ) + / 2(*0 - x\) 2 

2{fo(x\ — xi) + f\(x 2 — x 0 ) + f 2 (x 0 — jci)} 

You can use the symbolic computation capability of MATLAB by 
typing the following statements into the MATLAB command window: 

»syms x xi x2 x3 fO fl f2 

»L2 = fO*(x - xi)*(x - x2)/(x0 - x1)/(x0 - x2)+... 

f1 *(x - x2)*(x - x0)/(x1 - x2)/(x1 - x0) +... 

f2*(x - xO)*(x - x1)/(x2 - x0)/(x2 - xi) 

»pretty(solve(diff (L2))) 

(b) The second-degree Newton polynomial matching the three points 
(x 0 , fo), to. A), and (x 2 , f 2 ) is Eq. (3.2.4). 


n 2 (x) = a 0 + ai(x - x 0 ) + a 2 (x - xi)(x - xi) (P3.1.3) 


where 


, T^r fl ~ f0 

CIO = fo, Cl 1 = Dfo = - 

xi -x 0 

fi ~fi fi~ fo 

r.2 t D fl ~ D fo X 2 -Xi Xi- X 0 

a 2 = D fo = - = - 

x 2 -x 0 x 2 — x 0 


(P3.1.4) 


Find the zero of the derivative of this polynomial. 

(c) From Eq. (P3.1.1) with x 0 = — 1, jci = 0, and x 2 = 1, find the coeffi¬ 
cients of Lagrange coefficient polynomials L 2 #(x), L 2 \ (x), and L 22 (x). 
You had better make use of the routine “lagranp () ” for this job. 
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(d) From the third-degree Lagrange polynomial matching the four points 
(*o> /o), (jci, fi), (x 2 , fi), and (x 3 , f 3 ) with x 0 = -3,xi = -2, x 2 = 
— 1, and x 3 = 0, find the coefficients of Lagrange coefficient polyno¬ 
mials L 3 '0 (x), L 3i i(x), L 3y 2 (x), and L 3i3 (x). You had better make use 
of the routine “lagranp( )” for this job. 

3.2 Error Analysis of Interpolation Polynomial 

Consider the error between a true (unknown) function fix) and the interpo¬ 
lation polynomial Pn(x ) of degree A for some (A + 1) points of y = f(x), 
that is, 

{(*o. Jo), (xu yi), • • •, (x N , y N )} 

where fix) is up to (A + l)th-order differentiable. Noting that the error is 
also a function of x and becomes zero at the (A + 1) points, we can write 
it as 


e(x) = f(x) - P N (x ) = (x- x 0 )(x - x,) • • • ( a - - x N )g(x) (P3.2.1) 
Technically, we define an auxiliary function w{t) with respect to t as 

w(t) = fit) - P N it) -it- x 0 )it - aO • • • it - x N )gix) (P3.2.2) 


Then, this function has the value of zero at the (A + 2) points t = xo, x \,..., 
x ff, x and the 1/2/ • • • /(A + l)th-order derivative has (A + 1)/A/ • • • /I 
zeros, respectively. For t = to such that u/ 7V+l) (fo) = 0, we have 

w (N+1 \to) = f (N+1 \to) - 0 - (A + l)!g(x) = 0; 

S (x) = ( jv|i)! /(iV+1)(fo) (P3 - 2 ’ 3) 

Based on this, show that the error function can be rewritten as 


eix) = fix) - P N ix) = (a - x 0 )(x - xi) • • • (a - x N ) f^ N+ \t 0 ) 

iN + 1)1 

(P3.2.4) 


3.3 The Approximation of a Cosine Function 

In the way suggested below, find an approximate polynomial of degree 4 
for 

y = fix) = cosx (P3.3.1) 


(a) Find the Lagrange/Newton polynomial of degree 4 matching the fol¬ 
lowing five points and plot the resulting polynomial together with the 
true function cosx over [— it, +7T], 
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k 

0 

1 

2 3 

4 

x k 

—Jt 

—it 12 

0 +7T/2 

+7T 

/(*/ 0 

-1 

0 

1 0 

-1 


(b) Find the Lagrange/Newton polynomial of degree 4 matching the fol¬ 
lowing five points and plot the resulting polynomial on the same graph 
that has the result of (a). 


k 

0 

1 2 3 

4 

Xk 

n cos(97t/10) 

n cos(77t/10) 0 n cos(37t/10) 

7T COS(7T/10) 

f(xk ) 

-0.9882 

-0.2723 1 -0.2723 

-0.9882 


(c) Find the Chebyshev polynomial of degree 4 for cos* over [— n, +n\ 
and plot the resulting polynomial on the same graph that has the result 
of (a) and (b). 

3.4 Chebyshev Nodes 

The current speed/pressure of the liquid flowing in the pipe, which has irreg¬ 
ular radius, will be different from place to place. If you are to install seven 
speed/pressure gauges through the pipe of length 4 m as depicted in Fig. 
P3.4, how would you determine the positions of the gauges so that the max¬ 
imum error of estimating the speed/pressure over the interval [0, 4] can 
be minimized? 



0 12 3 

Figure P3.4 Chebyshev nodes. 

3.5 Pade Approximation 


For the Laplace transform 

F(s) = e~ sT 


(P3.5.1) 


representing the delay of T [seconds], we can write its Maclaurin series 
expansion up to fifth order as 


Mc(s) = 1 — sT + 


(sT) 2 (sTy ( sT ) 4 ( sT ) 5 


(a) Show that we can solve Eq. (3.4.4) and use Eq. (3.4.1) to get the Pade 
approximation as 


F(s) = p h i(s) = 


<7o + qis 
I +d,s 


1 - (T/2)s „ _ Ts 
1 + (T/2)s 6 


(P3.5.3) 
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(b) Compose a MATLAB program “nm3p05.m” that uses the routine 
“padeapO” to generate the Pade approximation of (P3.5.1) with T = 
0.2 and plots it together with the second-order Maclaurin series expan¬ 
sion and the true function (P3.5.1) for s = [—5,10]. You also run it to 
see the result as 


Pl.lOO 


1 - (T/2)s 
1 + (T/2)s 


-s + 10 

5 + 10 


(P3.5.4) 


3.6 Rational Function Interpolation: Bulirsch-Stoer Method [S-3] 

Table P3.6 shows the Bulirsch-Stoer method, where its element in the mth 
row and the (i + l)th column is computed by the following formula: 

Ri+1 _ Ri (x - x m+i )(R‘ m+1 - R^)(Rj l+l - R'J 

m ~ m+1 + (* ~ x m )(R‘ m - <-\) - (x - x m+i )(R‘ m+l - <-\) 

with R° n = 0 and R] n = y m for i = 1 : N and m = 1 : N — i 

(P3.6.1) 


function yi = rational_interpolation(x,y,xi) 

N = length(x); Ni = length(xi); 

R(:,1) = y(=); 

for n = 1:Ni 
xn = xi(n); 
for i = 1:N - 1 

RR1 = R(m + 1,i); RR2 = R(m,i); 
if i > 1 , 

RR1 = RR1 - R(m + 1,???); RR2 = RR2 - R(???,i - 1); 
end 

tmpl = (xn-x(???))*RR1; 

num = tmp1*(R(???,i) - R(m,?)); 

den = (xn - x(?))*RR2 -tmpl; 

R(m,i + 1) = R(m + 1,i) ????????; 
end 
end 

yi(n) = R(1,N); 
end 


Table P3.6 Bulirsch-Stoer Method for Rational Function Interpolation 


Data 

i = 1 

i =2 

i = 3 

i =4 

(*uy\) 

P} = yi 

R\ 

R i 

R\ 

(x2, yi) 

^2 = yi 

R\ 

R 2 


(X3, J3) 

Rl = y3 

R\ 



( x m ,y m ) 
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(a) The above routine “rational_interpolation(x,y,xi)” uses the 
Bulirsch-Stoer method to interpolate the set of data pairs (x,y) given 
as its first/second input arguments over a set of intermediate points 
xi given as its third input argument. Complete the routine and apply 
it to interpolate the four data points {(—1, /(—1)), (—0.2, /(—0.2)), 
(0.1, /(0.1)), (0.8, /(0.8))} on the graph of f(x) = 1/(1 + 8x 2 ) for 
xi = [-100:100 ]/100 and plot the interpolated curve together with the 
graph of the true function /( x). Does it work well? How about doing 
the same job with another routine “rat_interp()” listed in Section 8.3 
of [F-l]? What are the values of yi( [95:97]) obtained from the two 
routines? If you come across anything odd in the graphic results and/or 
the output numbers, what is your explanation? 

(cf) MATLAB expresses the in-determinant 0/0 (zero-divided-by-zero) as NaN 
(Not-a-Number) and skips the value when plotting it on a graph. It may, 
therefore, be better off for the plotting purpose if we take no special 
consideration into the case of in-determinant. 

(b) Apply the Pade approximation routine “padeap( )” (with M = 2 & N = 
2) to generate the rational function approximating f(x) = 1/(1 + 8x 2 ) 
and compare the result with the true function f(x). 

(c) To compare the rational interpolation method with the Pade approx¬ 
imation scheme, apply the routines rational_interpolation( ) and 
padeapf) (with M = 3 & N = 2) to interpolate the four data points 
{(—2, /(—2)), (—1, /(—1)), (1, /(l)), (2, /(2))} on the graph of 
/( x) = sin(jc) for xi = [ -100:100] *pi/100 and plot the interpolated 
curve together with the graph of the true function. How do you compare 
the approximation/interpolation results? 

3.7 Smoothness of a Cubic Spline Function 

We claim that the cubic spline interpolation function s(jt) has the smooth¬ 
ness property of 


(s"(x)) 2 dx ; 


(. f"(x)) 2 dx 


(P3.7.1) 


for any second-order differentiable function f(x) matching the given grid 
points and having the same first-order derivatives as s(x) at the grid points. 
This implies that the cubic spline functions are not so rugged. Prove it by 
doing the following. 

(a) Check the validity of the equality 


r x k+ 

dx k 


f"(x)s"(x)dx 


-s: 


(s"(x)fdx 


(P3.7.2) 
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where the left-hand and right-hand sides of this equation are 
LHS: J *" f"(x)s"(x)dx 

= f\x)s"(x)\ x x l +1 - J f(x)s'"(x) dx 

= f\x k+1 )s"(x k+l ) - f(x k )s"(x k ) - C{f(x k+X ) - f{x k )) (P3.7.3a) 
RHS: J * + ' s"(x)s"(x)dx 

= s\x k+l )s"(x k+ 1 ) - s'(x k )s"(x k ) - C(s(x k+1 ) - s(x k )) (P3.7.3b) 
(b) Check the validity of the following inequality: 

0 < J Xk+ \f"(x)-s"(x)) 2 dx 

= J XM (f(x)) 2 dx- 2 P +1 f"(x)s"(x) dx + J XM (s"(x)) 2 dx 

(P='2) j XM (fix)) 2 dx - J Xt+1 (s"(x)) 2 dx 

(fix)) 2 dx < J Xt " (s"(x)) 2 dx (P3.7.4) 

3.8 MATLAB Built-in Routine for Cubic Spline 
There are two MATLAB built-in routines: 

»yi = spline(x,y,xi); 

»yi = interpl(x,y,xi,'spline'); 

Both receive a set of data points (x, y) and return the values of the cubic 
spline interpolating function s(x) for the (intermediate) points xi given as 
the third input argument. Write a program that uses these MATLAB routines 
to get the interpolation for the set of data points 


{(0,0), (0.5, 2), (2, -2), (3.5, 2), (4, 0)} 

and plots the results for [0, 4]. In this program, append the statements that 
do the same job by using the routine “cspline(x,y , KC)” (Section 3.5) with 
KC = 1,2, and 3. Which one yields the same result as the MATLAB built- 
in routine? What kind of boundary condition does the MATLAB built-in 
routine assume? 
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3.9 Robot Path Planning Using Cubic Spline 

Every object having a mass is subject to the law of inertia and so its 
speed described by the first derivative of its displacement with respect to 
time must be continuous in any direction. In this context, the cubic spline 
having the continuous derivatives up to second order presents a good basis 
for planning the robot path/trajectory. We will determine the path of a robot 
in such a way that the following conditions are satisfied: 

• At time t = Os, the robot starts from its home position (0, 0) with zero 
initial velocity, passing through the intermediate point (1, 1) at / = 1 s 
and arriving at the final point (2, 4) at t = 2 s. 

• On arriving at (2, 4), it starts the point at t = 2 s, stopping by the 
intermediate point (3, 3) at t = 3 s and arriving at the point (4, 2) at 
f = 4s. 

• On arriving at (4, 2), it starts the point, passing through the intermediate 
point (2,1) at t = 5 s and then returning to the home position (0, 0) at 
t = 6 s. 

More specifically, what we need is 

• the spline interpolation matching the three points (0, 0),(1, 1),(2, 2) and 
having zero velocity at both boundary points (0, 0) and (2, 2), 

• the spline interpolation matching the three points (2, 2),(3, 3),(4, 4) and 
having zero velocity at both boundary points (2, 2) and (4, 4), and 

• the spline interpolation matching the three points (4, 4), (5, 2), (6, 0) and 
having zero velocity at both boundary points (4, 4) and (6, 0) on the tx 
plane. 

On the ty plane, we need 

• the spline interpolation matching the three points (0, 0),(1, 1),(2, 4) and 
having zero velocity at both boundary points (0, 0) and (2, 4), 

• the spline interpolation matching the three points (2, 4),(3, 3),(4, 2) and 
having zero velocity at both boundary points (2, 4) and (4, 2), and 

• the spline interpolation matching the three points (4, 2),(5, 1),(6, 0) and 
having zero velocity at both boundary points (4, 2) and (6, 0). 

Supplement the following incomplete program “robot_path”, whose objec¬ 
tive is to make the required spline interpolations and plot the whole robot 
path obtained through the interpolations on the xy plane. Run it to get the 
graph as depicted in Fig. P3.9c. 




xl = [0 1 2]; yl = [0 1 4]; tl = [0 

xil = csplineftl,x1,ti1); yil = cspline 


plot(xi1,yil, 1 k 1 
plot([xl(1) x2(1 
plot([xl x2 x3], 


, xi2,yi2, 1 b', xl3,yl3, 

) x3(1) x3(end)],[y1(1) 
[yl y2 y3], 1 k+ 1 ), axis<| 


(tl.ylltll); 


y2(1) y3(1) y3(end)],'o 1 
0 5 0 5]) 


2]J 





PROBLEMS 165 



Figure P3.9 Robot path planning using the cubic spline interpolation. 


3.10 One-Dimensional Interpolation 

What do you have to give as the fourth input argument of the MATLAB 
built-in routine “interpl ()” in order to get the same result as that would 
be obtained by using the following one-dimensional interpolation routine 
“intrpl ()”? What letter would you see if you apply this routine to inter¬ 
polate the data points {(0,3), (1,0), (2,3), (3,0), (4,3)} for [0,4]? 



3.11 Least-Squares Curve Fitting 

(a) There are several nonlinear relations listed in Table 3.5, which 
can be linearized to fit the LS algorithm. The MATLAB routine 
“curve_fit()” implements all the schemes that use the LS method 
to find the parameters for the template relations, but the parts for the 
relations (1), (2), (7), (8), and (9) are missing. Supplement the missing 
parts to complete the routine. 

(b) The program “nm3p11. m” generates the 12 sets of data pairs according to 
various types of relations (functions), applies the routines 
“curve_f it () ’7“lsqcurvef it () ” to find the parameters of the template 
relations, and plots the data pairs on the fitting curves obtained from the 
template functions with the estimated parameters. Complete and run it 
to get the graphs like Fig. P3.ll. Answer the following questions. 
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(i) If any, find the case(s) where the results of using the two routines 
make a great difference. For the case(s), try with another initial 
guess thO = [1 1] of parameters, instead of thO =[0 0], 

(ii) If the MATLAB built-in routine “lsqcurvefitO” yields a bad 
result, does it always give you a warning message? How do you 
comnare the two routines? 
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(cf) 


%nm3p11 to plot Fig.P3.11 by curve fitting 
clean 

x = [1: 20]*2 - 0.1; Nx = length(x); 

noise = rand(1,Nx) - 0.5; % IxNx random noise generator 

xi = [1:40]-0.5; %interpolation points 
figure(l), elf 

a = 0.1; b = -1; c = -50; %Table 3.5(0) 

y = a*x.' , 2 + b*x + c + 10*noise(1 :Nx); 

[th,err,yi] = curve_fit(x,y,0,2,xi); [a b c],th 
[a b c],th %if you want parameters 
f = inline]’ th(1)*x. A 2 + th(2)*x+th(3) 1 , 1 th 1 , 1 x 1 ); 

[th,err] = lsqcurvefit(f,[0 0 0],x,y), yil = f(th,xi); 
subplot (321), plot(x,y, " * " , xijyi/k 1 , xi,yi1,'r') 
a = 2; b = 1; y = a./x + b + 0.1*noise(1:Nx); %Table 3.5(1) 
[th,err,yi] = curve_fit(x,y,1,0,xi); [a b],th 
f = inline( 1 th(1)./x + th(2) 1 , 1 th','x'); 

thO = [00]; [th,err] = lsqcurvefit(f,thO,x,y), yil = f(th,xi); 
subplot(322), plot(x,y,'*', xi.yi.k', xi,yil,'r 1 ) 
a = -20; b = -9; y = b./(x+a) + 0.4*noise(1:Nx); %Table 3.5(2) 
[th,err,yi] = curve_fit(x,y,2,0,xi); [a b],th 
f = inline]'th(2)./(x+th(1)) 1 ,'th','x'); 

thO = [00]; [th,err] = lsqcurvefit(f,thO,x,y), yil = f(th,xi); 
subplot(323), plot(x,y,'*', xi,yi, k , xi,yil,'r 1 ) 
a = 2.; b = 0.95; y = a*b. A x + 0.5*noise(1:Nx); %Table 3.5(3) 
[th,err,yi] = curve_fit(x,y,3,0,xi); [a b],th 
f = inline( 'th(1)*th(2).~x 1 ,'th', 1 x 1 ); 

thO = [0 0]; [th,err] = lsqcurvefit(f,thO,x,y), yil = f(th,xi); 
subplot(324), plot(x,y,, xi,yi,'k', xi,yi1,r) 
a = 0.1; b = 1; y = b*exp(a*x) +2*noise(1:Nx); %Table 3.5(4) 
[th,err,yi] = curve_fit(x,y,4,0,xi); [a b],th 
f = inline]'th(2)*exp(th(1)*x)","th","x"); 

thO = [00]; [th,err] = lsqcurvefit(f,thO,x,y), yil = f(th,xi); 
subplot(325), plot(x,y,, xi,yi,'k', xi,yi1,r) 
a = 0.1; b = 1; %Table 3.5(5) 

y = -b*exp(-a*x); C = -min(y)+1; y = C + y + 0.1*noise(1:Nx); 

[th,err,yi] = curve_fit(x,y,5,C,xi); [a b],th 

f = inline]'1-th(2)*exp(-th(1)*x) 1 ,'th','x'); 

thO = [00]; [th,err] = lsqcurvefit(f,thO,x,y), yil = f(th,xi); 

subplot (326), plot(x,y, ' * ' , xi.yij'k', xi,yil , ' r 1 ) 

figure(2), elf 

a = 0.5; b = 0.5; y = a*x.~b +0.2*noise(1:Nx); %Table 3.5(6a) 
[th,err,yi] = curve_fit(x,y,0,2,xi); [a b],th 
f = inline('th(1)*x.~th(2)','th','x'); 

thO = [00]; [th,err] = lsqcurvefit(f,thO,x,y), yil = f(th,xi); 
subplot (321), plot(x,y, ' * ' , xi.yi/k', xi,yil , ' r 1 ) 
a = 0.5; b = -0.5; %Table 3.5(6b) 

y = a*x.'b + 0.05*noise(1:Nx); 

[th,err,yi] = curve_fit(x,y,6,0,xi); [a b],th 
f = inline('th(1)*x.~th(2)","th","x"); 

thO = [0 0]; [th,err] = lsqcurvefit(f,thO,x,y), yil = f(th,xi); 
subplot (322), plot(x,y, ' * ' , xi^yi/k 1 , xi.yil/r) 


If there is no theoretical basis on which we can infer the physical relation 
between the variables, how do we determine the candidate function suitable 
for fitting the data pairs? We can plot the graph of data pairs and choose one 
of the graphs in Fig. P3.ll which is closest to it and choose the corresponding 
template function as the candidate fitting function. 
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Figure P3.11 LS fitting curves for data pairs with various relations. 


3.12 Two-Dimensional Interpolation 

Compose a routine “z = find_depth(xi,yi)” that finds the depth z of a 
geological stratum at a point (xi,yi) given as the input arguments, based 
on the data in Problem 1.4. 
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(cf) If you have no idea, insert just one statement involving l interp2()’ into 
the program ‘nm1p04.m’ (Problem 1.4) and fit it into the format of a MAT- 
LAB function. 

3.13 Polynomial Curve Fitting by Least Squares and Persistent Excitation 

Suppose the theoretical (true) relationship between the input x and the 
output y is known as 

y = x + 2 (P3.13.1) 

Charley measured the output data y 10 times for the same input value 
x = 1 by using a gauge whose measurement errors has a uniform distribu¬ 
tion U[—0.5, +0.5], He made the following MATLAB program “nm3p13”, 
which uses the routine “polyfits()” to find a straight line fitting the data, 

(a) Check the following program and modify it if needed. Then, run the 
program and see the result. Isn’t it beyond your imagination? If you use 
the MATLAB built-in function “polyfit()”, does it get any better? 


%nm3p13.m 

tho = [1 2]; %true parameter 

x = ones(1,10); %the unchanged input 

y = tho(1)*x + tho(2)+(rand(size(x)) - 0.5); 

th_ls = polyfits(x,y,1); %uses the MATLAB routine in Sec.3.8.2 

polyfit(x,y,1) %uses MATLAB built-in function 


(b) Note that substituting Eq. (3.8.2) into Eq.(3.8.3) yields 

e ° = [t] = lATAr ' ATy 

Lf.o-4 E.V.] 

£" 0 x n £f =0 lj 

If x„ = c(constant) Vn = 0 : M, is the matrix A 7 A invertible? 

(c) What conclusion can you derive based on (a) and (b), with reference to 
the identifiability condition that the input must be rich in some sense 
or persistently exciting? 

(cf) This problem implies that the performance of the identification/estimation 
scheme including the curve fitting depends on the characteristic of input 
as well as the choice of algorithm. 

3.14 Scaled Curve Fitting for an Ill-Conditioned Problem [M-2] 

Consider Eq. (P3.13.2), which is a typical least-squares (LS) solution. The 
matrix A 7 A, which must be inverted for the solution to be obtained, may 
become ill-conditioned by the widely different orders of magnitude of its 
elements, if the magnitudes of all x n ’s are too large or too small, being far 
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from 1 (see Remark 2.3). You will realize something about this issue after 
solving this problem. 

(a) Find a polynomial of degree 2 which fits four data points (10 6 ,1), (1.1 x 

10 6 , 2), (1.2 x 10 6 , 5), and (1.3 x 10 6 , 10) and plot the polynomial 
function (together with the data points) over the interval [10 6 , 1.3 x 

10 6 ] to check whether it fits the data points well. How big is the relative 
mismatch error? Does the polynomial do the fitting job well? 

(b) Find a polynomial of degree 2 which fits four data points (10 7 , 1),(1.1 x 

10 7 , 2), (1.2 x 10 7 , 5), and (1.3 x 10 7 , 10) and plot the polynomial 
function (together with the data points) over the interval [10 7 , 1.3 x 

10 7 ] to check whether it fits the data points well. How big is the relative 
mismatch error? Does the polynomial do the fitting job well? Did you 
get any warning message on the MATLAB command window? What 
do you think about it? 

(c) If you are not satisfied with the result obtained in (b), why don’t you 
try the scaled curve fitting scheme described below? 

1. Transform the x„’s of the data point (x n , y„)’s into the region 
[—2, 2] by the following relation. 

4 

x’ n «- -2 +--- (x n - Xmin ) (P3.14.1) 

•*max -^min 

2. Find the LS polynomial p(x r ) fitting the data point (x' n , y„)’s. 

3. Substitute 

x' «- -2 + ---(x - Xmin) (P3.14.2) 

Xmax — Xmin 

for x' into p(x'). 

(cf) You can complete the following program “nm3pl4” and ran it to get the 
numeric answers. 


%nm3p14.m 
clear, elf 
format long e 

x = 1e6*[1 1.1 1.21.3]; y = [1 2510]; 

xi = x(1) + [0:1000]/1000*(x(end) - x(1)); 

[p,err,yi] = curve_fit(x,y,0,2,xi); p, err 
plot(x,y, 1 o 1 ,xi,yi), hold on 

xmin = min(x); xmax = max(x); 

xi = -2 + 4*(x-xmin)/(xmax - xmin); 

xli = ??????????????????????????; 

[pl.err.yi] = ?????????????????????????; pi, err 
plot(x,y,'o',xi,yi) 

%To get the coefficients of the original fitting polynomial 
psl = poly2sym(p1); 

syms x; psO = subs(ps1,x, - 2 + 4/(xmax - xmin)*(x - xmin)); 
pO = sym2poly(psO) 
format short 
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3.15 Weighted Least-Squares Curve Fitting 

As in Example 3.7, we want to compare the results of applying the LS 
approach and the WLS approach for finding a function that we can believe 
will describe the relation between the input a: and the output y as 

y = a x e bx (P3.15) 

where the data pair ( x m , y m )’s are given as 

{(1,3.2908), (5, 3.3264), (9, 1.1640), (13, 0.3515), (17, 0.1140)} 
from gauge A with error range ±0.1 
{(3,4.7323), (7, 2.4149), (11, 0.3814), (15, -0.2396), (19, -0.2615)} 
from gauge B with error range ± 0.5 

Noting that this corresponds to the case of Table 3.5(7), use the MATLAB 
routine “curve_fit()” for this job and get the result as depicted in Fig. 
P3.15. Identify which one of the two lines a and b is the WLS fitting curve. 
How do you compare the results? 



Figure P3.15 The LS and WLS fitting curves to y = . 


3.16 DFT (Discrete Fourier Transform) Spectrum 

Supplement the part of the MATLAB program “do fft” (Section 3.9.2), 
which computes the DFT spectra of the two-tone analog signal described by 
Eq. (3.9.2) for the cases of zero-padding and whole interval extension and 
plots them as in Figs. 3.13c and 3.13d. Which is the clearest one among 
the four spectra depicted in Fig. 3.13? If you can generalize this, which 
would you choose among up-sampling, zero-padding, and whole interval 
extension to get a clear spectrum? 
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3.17 Effect of Sampling Period, Zero-Padding, and Whole Time Interval on 
DFT Spectrum 


In Section 3.9.2, we experienced the effect of zero-padding, sampling period 
reduction, and whole interval extension on the DFT spectrum of a two-tone 
signal that has two distinct frequency components. Here, we are going 
to investigate the effect of zero-padding, sampling period reduction, and 
whole interval extension on the DFT spectrum of a triangular pulse depicted 
in Fig. P3.17.1c. Additionally, we will compare the DFT with the CtFT 
(continuous-time Fourier transform) and the DtFT (discrete-time Fourier 
transform) [0-1]. 

(a) The definition of CtFT that is used for getting the spectrum of a 
continuous-time finite-duration signal x(t) is 


(P3.17.1) 



(a) Two rectangular pulses (b) r(t) * r(t) = A(f) (c)x(f) = A(f+2)-A(f-2) 

Figure P3.17.1 A triangular pulse as the convolution of two rectangular pulses. 

The CtFT has several useful properties including the convolution 
property and the time-shifting property described as 

(CtFT) 

x(t) * y(t) -> X{a))Y{a>) (P3.17.2) 

(CtFT) 

x(t - h) -» X(co)e- J<otl (P3.17.3) 

Noting that the triangular pulse is the convolution of the two rectangular 
pulse r(t)’ s whose CtFTs are 

R(co) = CtFT{r(t)} = j e i<ot dt = 2^- 

we can use the convolution property (P3.17.2) to get the CtFT of the 
triangular pulse as 

CtFT{A(r)} = CtFT{r(r) * r(t)} (P3 = ' 2) R(co)R(co) 



(P3.17.4) 
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Figure P3.17.2 Effects of sampling period, 


)-padding, and whole interval on DFT spectrum. 
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Successively, use the time shifting property (P3.17.3) to get the CtFT of 

x(t) = A(t + 2)- A(f-2) (P3.17.5) 

as 


X(co) (P3 '= 3,4) T(o))e i2, ° - T(a))e~' 2oj = j 8sin(2m) sine 2 

(P3.17.6) 

Get the CtFT Y (co) of the triangular wave that is generated by repeating 
x(t) two times and described as below. 

y(t) = x(t + 4) + x(t - 4) (P3.17.7) 

Plot the spectrum X (co) for 0 < co < 2n and check if the result is the 
same as depicted in a solid line in Fig. P3.17.2a or P3.17.2c. You can 
also plot the spectrum X(co) for 0 < oo < 47T and check if the result 
is the same as the solid line in Fig. P3.17.2b. Additionally, plot the 
spectrum Y (co) for 0 < oo < 2n and check if the result is the same as 
the solid line in Fig. P3.17.2d. 

(b) The definition of DtFT, which is used for getting the spectrum of a 
discrete-time signal x[n], is 


X(£2) = Y, x We~ jQn (P3.17.8) 


Use this formula to compute the DtFTs of the discrete-time signals 
x a [n],Xb[n], x c [n], and plot them to see if the results are the 
same as the dotted lines in Fig. P3.17.2a-d. What is the valid ana¬ 
log frequency range over which each DtFT spectrum is similar to the 
corresponding CtFT spectrum, respectively? Note that the valid analog 
frequency range is [—n/T, +n/T] for the sampling period T. 

(c) Use the definition (3.9.1a) of DFT to get the spectra of the discrete-time 
signals x a [n], x h [n\, x c \n\, and x d [n\ and plot them to see if the results 
are the same as the dots in Fig. P3.17.2a-d. Do they match the samples 
of the corresponding DtFTs at £2* = 2kn/Nl Among the DFT spectra 
(a), (b), (c), and (d), which one describes the corresponding CtFT or 
DtFT spectra for the widest range of analog frequency? 

3.18 Windowing Techniques Against the Leakage of DFT Spectrum 

There are several window functions ready to be used for alleviating the 
spectrum leakage problem or for other purposes. We have made a MAT- 
LAB routine “windowing ()” for easy application of the various windows. 
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Applying the Hamming window function to the discrete-time signal xj[n] 
in Fig. 3.13d, get the new DFT spectrum, plot its magnitude together with 
the windowed signal, check if they are the same as depicted in Fig. P3.18b, 
and compare it with the old DFT spectrum in Fig. 3.13d or Fig. P3.18a. 
You can start with the incomplete MATLAB program “nm3p18.m” below. 
What is the effect of windowing on the spectrum? 



2 4 t = nT 6 

(a) Rectangular window 


Xdln] 



-2 I-*-*-- 

0 2 4 t=nT 6 

(b) Bartlett/triangular windowing 



function xw = windowing(x,w) 

N = length(x); 

if nargin < 2 | w == 'rt' | isempty(w), xw = x; 
elseif w == 1 bt 1 , xw = x.*bartlett(N) 1 ; 
elseif w == 1 bk 1 , xw = x.*blackman(N) 1 ; 
elseif w == 1 hm 1 , xw = x.*hamming(N) 1 ; 

end _ 

%nm3p18: windowing effect on DFT spectrum 
wl = 1.5*pi; w2 = 3*pi; %two tones 
N = 64; n = 1:N; T = 0.1; t = (n - 1)*T; 
k = 1:N; wO = 2*pi/T; w = (k - 1)*w0; 
xbn = sin(w1*t) + 0.5*sin(w2*t); 
xbwn = windowing(xbn, 1 bt 1 ); 

Xb = fft(xbn); Xbw = fft(xbwn); 
subplot(421), stem(t,xbn, 1 .') 
subplot(423), stem(k,abs(Xb), 1 .') 
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3.19 Interpolation by Using DFS: Zero-Padding on the Frequency Domain 
The fitting curve in Fig. 3.14d has been obtained by zeroing out all the 
digital frequency components higher than nl 2 [rad](lV/4 < k < 3N/4) of the 
sequence x[n] in Fig. 3.14a. Plot another fitting curve obtained by removing 
all the frequency components higher than 7 t/ 4 [rad](lV/8 < k < 7/V/8) and 
compare it with Fig. 3.14d. 

3.20 On-Line Recursive Computation of DFT 

For the case where you need to compute the DFT of a block of data every 
time a new sampled data replaces the oldest one in the block, we derive 
the following recursive algorithm for DFT computation. 

Defining the first data block and the mth data block as 

{x 0 [0], t 0 [1], ..., x 0 [!V - 1]} = {0,0,.... 0} (P3.20.1) 

{x m [0], x m [l],..., x m [N - 1]} = [x[m], x\m + 1],..., x[m + N - 1]} (P3.20.2) 

the DFT for the (m + l)th data block 

{x m+1 [0], x m+1 [l],..., x m+ \[N - 1]} = {x[m + 1], x[m + 2 x[m + IV]} 

(P3.20.3) 


can be expressed in terms of the DFT for the mth data block 


X m (k) = ^x m [n]e- 


(P3.20.4) 


as follows: 

X m+1 (*) = Yf n -o Xm+l [n]e~ j2nnk/N = Yln-o Xm[n + \]e- i2nnklN 
= x m [n + }) e - j2nn+])k/N e j27lk/N 

= J2 N n=i x m [n}e- j2xnklN e j2 * klN 
= {e;:; + XIIVI - x[0] J 

= {X m {k) + x[N] - x[0]}e j2nk/N (P3.20.5) 


You can compute the 128-point DFT for a block composed of 128 random 
numbers by using this RDFT algorithm and compare it with that obtained 









by using the MATLAB built-in routine You 

incomplete MATLAB program “do_RDFT.m” below. 


%dO_RDFT 
clear, elf 

N = 128; k = [0:N - 1]; 

x = zeros(1,N); %initialize the data block 
Xr = zeros(1,N); % and its DFT 
for m = 0:N 

xN = rand; %new data 

Xr = (Xr + xN - x(1)).*???????????????? %RDFT fo 
x = [x(2:N) xN]; 


dif = norm(Xr-fft(x)) %difference between RDFT and 
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4.1 ITERATIVE METHOD TOWARD FIXED POINT 

Let’s see the following theorem. 

Fixed-Point Theorem: Contraction Theorem [K ' 2, Sectlon 51] . Suppose a function 
g{x) is defined and its first derivative g'(x) exists continuously on some interval 
I = [x° — r, x° + r] around the fixed point x° of g(x) such that 


g(x°) = x° (4.1.1) 

Then, if the absolute value of g\x) is less than or equal to a positive number a 
that is strictly less than one, that is, 

Itf'OOl < <* < 1 (4.1.2) 

the iteration starting from any point xo el 

Xk+ i = g(x0 with x 0 el (4.1.3) 

converges to the (unique) fixed point x° of g(x). 


Applied. Numerical Methods Using MATLAB ®, by Yang, Cao, Chung, and Morris 
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Proof. The Mean Value Theorem (MVT) (Appendix A) says that for any two 
points To and x°, there exists a point x between the two points such that 

g(x o) - g(x°) = g'(x)(x 0 - x°y, x i - x° (41 - 3) = (4 L1> g'(x)(x o - x°) (1) 


Taking the absolute value of both sides of (1) and using the precondition 
(4.1.2) yields 

l*i ~x°\ < a\x 0 -x°\ < \x 0 -x°\ (2) 


which implies that x\ is closer to x° than x 0 and thus still stays inside the interval 
I. Applying this successively, we can get 

\xk - x°\ < a\x k ~i - x°\ < a 2 \x k -i - x°\ < ■ ■ ■ < a k \xo - x°\ -»• 0 as k -»• oo (3) 


which implies that the iterative sequence [x k ] generated by (4.1.3) converges to x°. 


(Q) Is there any possibility that the fixed point is not unique—that is, more than one 
point satisfy Eq. (4.1.1) and so the iterative scheme may get confused among the 
several fixed points? 

(A) It can never happen, because the points x° l and x° 2 satisfying Eq. (4.1.1) must 
be the same: 


|* 01 - x o2 l = \g(x o1 ) - g(x o2 )\ < a\x o1 - x o2 \ (a < 1); |x o1 - x o2 \ = 0; x* 1 = x" 2 


In order to solve a nonlinear equation fix) = 0 using the iterative method based on 
this fixed-point theorem, we must somehow arrange the equation into the form 


v = g(x) (4.1.4) 

and start the iteration (4.1.3) with an initial value xo, then continue until some stop¬ 
ping criterion is satisfied; for example, the difference \x k+ \ — x k \ between the successive 
iteration values becomes smaller than some predefined number (TolX) or the iteration 
number exceeds some predetermined number (Maxlter). This scheme is cast into the 
MATLAB routine “fixptO”. Note that the second output argument (err) is never the 
real error—that is, the distance to the true solution—but just the last value of \x k +i — x k \ 
as an error estimate. See the following remark and examples. 

Remark 4.1. Fixed-Point Iteration. Noting that Eq. (4.1.4) is not unique for a 
given fix) = 0, it would be good to have g(x) such that \g'(x)\ < 1 inside 
the interval I containing its fixed point x° which is the solution we are look¬ 
ing for. It may not be so easy, however, to determine whether |g , (*)| < 1 is 
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satisfied around the solution point if we don’t have any rough estimate of the 
solution. 



Example 4.1. Fixed-Point Iteration. Consider the problem of solving the nonlin¬ 
ear equation 

/ 4 i(x) = x 2 -2 = 0 (E4.1.1) 

In order to apply the fixed-point iteration for solving this equation, we need 
to convert it into a form like (4.1.4). Let’s try with the following three forms and 
guess that the solution is in the interval I = (1, 1.5). 

(a) How about x 2 -2 = 0—>-x 2 = 2—^x = 2/jc = g a (x)? (E4.1.2) 

Let’s see if the absolute value of the first derivative of g a (x) is less than 
one for the solution interval, that is, \g a '(x)\ = 2/x 1 < 1 V,t e /. This 
condition does not seem to be satisfied and so we must be pessimistic 
about the possibility of reaching the solution with (E4.1.2). We don’t need 
many iterations to confirm this. 


(E4.1.3) 

The iteration turned out to be swaying between 1 and 2, never approaching 
the solution. 

(b) How about x 2 - 2 = 0 ^ (x - l) 2 +2x - 3 = 0 ^ x = -±{(x - l) 2 - 
3 }=g b (xf (E4.1.4) 

This form seems to satisfy the convergence condition 


\gb(x)\ = |x — 1| < 0.5 < 1 Vxel 


(E4.1.5) 
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and so we may be optimistic about the possibility of reaching the solution 
with (E4.1.4). To confirm this, we need just a few iterations, which can 
be performed by using the routine “f ixpt ()”. 


»gb=inline('-((x-1). A 2-3)/2','x 1 ); 

»[x,err,xx]=fixpt(gb,1,1e-4,50); 

1.0000 1.5000 1.3750 1.4297 1.4077 

The iteration is obviously converging to the true solution V2 = 1.414..., 
which we already know in this case. This process is depicted in Fig. 4.1a. 
(c) How about x 2 = 2->x=j->x + x= j+ x->-x=j(x + j) = 
g c (x)? (E4.1.6) 

This form seems to satisfy the convergence condition 

Igc'wf = \ I 1 - < 0.5 < 1 Vre/ (E4.1.7) 

which guarantees that the iteration will reach the solution. Moreover, since 
this derivative becomes zero at the solution of x 2 = 2, we may expect fast 
convergence, which is confirmed by using the routine “fixptO”. The 
process is depicted in Fig. 4.1b. 

»gc = inline('(x+2./x)/2'x'); 

»[x,err,xx] = fixpt(gc,1,1e-4,50); 

1.0000 1.5000 1.4167 1.4142 1.4142 

(cf) In fact, if the nonlinear equation that we must solve is a polynomial equation, 
then it is convenient to use the MATLAB built-in command “roots ()”. 



(a) x k+ i=g b (x^ = -^{(x k -A) 2 -3} (b) x k+ , = g c (x k ) = 4 \x k +-^j 

Figure 4.1 Iterative method to solve nonlinear equations based on the fixed-point theorem. 
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(Q) How do we make the iteration converge to another solution x — —\Fl of 
x 2 - 2 = 0? 

4.2 BISECTION METHOD 

The bisection method can be applied for solving nonlinear equations like f(x) = 
0, only in the case where we know some interval [ a , b] on which /( x) is contin¬ 
uous and the solution uniquely exists and, most importantly, f(a ) and f (b) have 
the opposite signs. The procedure toward the solution of f(x) = 0 is described 
as follows and is cast into the MATLAB routine “bisct()”. 

Step 0. Initialize the iteration number k = 0. 

Step 1. Let m = l(a + b). If f(m) fs Oor j(b — a) fs 0, then stop the iteration. 
Step 2. If f(a)f(m) > 0, then let a f— m; otherwise, let b ■<— m. Go back to 
step 1. 


function [x,err,xx] = bisct(f,a,b,TolX,MaxIter) 

%bisct.m to solve f(x) = 0 by using the bisection method. 

%input : f = ftn to be given as a string 'f 1 if defined in an M-file 
% a/b = initial left/right point of the solution interval 

% TolX = upperbound of error |x(k) - xo| 

% Maxlter = maximum # of iterations 

%output: x = point which the algorithm has reached 

% err = (b - a)/2(half the last interval width) 

% xx = history of x 

TolFun=eps; fa = feval(f,a); fb = feval(f,b); 
if fa*fb > 0, error('We must have f(a)f(b)<0! 1 ); end 
for k = 1: Maxlter 
xx(k) = (a + b)/2; 

fx = feval(f,xx(k)); err = (b-a)/2; 
if abs(fx) < TolFun | abs(err)<TolX, break; 
elseif fx*fa > 0, a = xx(k); fa = fx; 
else b = xx(k); 
end 

x = xx(k); 

if k == Maxlter, fprintf('The best in %d iterations'^Maxlter), end 


Remark 4.2. Bisection Method Versus Fixed-Point Iteration 

1. Only if the solution exists on some interval [a, b], the distance from the 
midpoint (a + b )/2 of the interval as an approximate solution to the true 
solution is at most one-half of the interval width—that is, (b — a) /2, which 
we take as a measure of error. Therefore, for every iteration of the bisection 
method, the upper bound of the error in the approximate solution decreases 
by half. 
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2. The bisection method and the false position method appearing in the next 
section will definitely give us the solution, only if the solution exists 
uniquely in some known interval. But the convergence of the fixed- 
point iteration depends on the derivative of g(x ) as well as the initial 
value xq. 

3. The MATLAB built-in routine fzero(f,x) finds a zero of the function 
given as the first input argument, based on the interpolation and the bisec¬ 
tion method with the initial solution interval vector x = [a h] given as 
the second input argument. The routine is supposed to work even with an 
initial guess x = xq of the (scalar) solution, but it sometimes gives us a 
wrong result as illustrated in the following example. Therefore, it is safe 
to use the routine f zero( ) with the initial solution interval vector [a b\ as 
the second input argument. 

Example 4.2. Bisection Method. Consider the problem of solving the nonlinear 
equation 

f A2 {x) = tan(rr - x) - x = 0 (E4.2.1) 

Noting that f4 2 (x) has the value of infinity at x = Jt/2 = 1.57 ..., we set 
the initial solution interval to [1.6, 3] excluding the singular point and use the 
MATLAB routine “bisct()” as follows. The iteration seems to be converging 
to the solution as we expect (see Fig. 4.2b). 

»f42 = inline('tan(pi - xj-x'/x'); 

»[x,err,xx] = bisct(f42,1,6,3,1e-4,50); 

2.3000 1.9500 2.1250 2.0375 1.9937 2.0156 ... 2.0287 

But, if we start with the initial solution interval [a, b ] such that f(a) and f(b) 
have the same sign, we will face the error message. 

»[x,err,xx] = bisct(f42,1,5,3,1e-4,50); 

??? Error using ==> bisct 

We must have f(a)f(b)<0! 

Now, let’s see how the MATLAB built-in routine fzeroff ,x) works. 

» fzero(f42,[1.6 3]) 

ans = 2.0287 %good job! 

» fzero(f42,[1.5 3]) 

??? Error using ==> fzero 

The function values at interval endpoints must differ in sign. 

» fzero(f42,1.8) %with an initial guess as 2 nd input argument 
ans = 1.5708 %wrong result with no warning message 


(cf) Not all the solutions given by computers are good, especially when we are careless. 
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a k 

** 

b k 

f(x k ) 

0 

1.6 


3.0 

32.6, -2.86 

1 

1.6 

2.3 

3.0 

-1.1808 

2 

1.6 

1.95 

2.3 

0.5595 

3 

1.95 

2.125 

2.3 

-0.5092 


1.95 

2.0375 

2.125 

-0.5027 


(a) Process of the bisection method 



1.6 1.8 2 2.2 2.4 2.6 2.8 

(b) The graph of f(x) = tan(.T - x) - x 


Figure 4.2 Bisection method for Example 4.2. 


4.3 FALSE POSITION OR REGULA FALSI METHOD 


Similarly to the bisection method, the false position or regula falsi method starts 
with the initial solution interval [a, b] that is believed to contain the solution of 
f(x ) = 0. Approximating the curve of f(x) on [a, b\ by a straight line connecting 
the two points (a, f(a )) and (b, f(b)), it guesses that the solution may be the 
point at which the straight line crosses the x axis: 


fib) _ afjb) - bfja) 

f(b)-f(ay U) f{a) — f (b) 

(4.3.1) 


function [x,err,xx] = falsp(f,a,b,TolX,MaxIter) 

%bisct.m to solve f(x)=0 by using the false position method. 

%input : f = ftn to be given as a string 'f 1 if defined in an M-file 
% a/b = initial left/right point of the solution interval 

% TolX = upperbound of error(max(|x(k)-a|,|b-x(k)|)) 

% Maxlter = maximum # of iterations 

%output: x = point which the algorithm has reached 

% err = max(x(last)-a|,|b-x(last)|) 

% xx = history of x 

TolFun = eps; fa = feval(f,a); fb=feval(f,b); 

if fa*fb > 0, error('We must have f(a)f(b)<0!'); end 

for k = 1: Maxlter 

xx(k) = (a*fb-b*fa)/(fb-fa); %Eq.(4.3.1) 
fx = feval(f,xx(k)); 

err = max(abs(xx(k) - a),abs(b - xx(k))); 
if abs(fx) < TolFun | err<TolX, break; 
elseif fx*fa > 0, a = xx(k); fa = fx; 
else b = xx(k); fb = fx; 
end 
end 

x = xx(k); 

if k == Maxlter, fprintf('The best in %d iterations'^ 1 ,Maxlter), end 
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Figure 4.3 Solving the nonlinear equation f(x ) = tan(w - x) — x = 0. 


For this method, we take the larger of \x — a and \b — x\ as the measure of error. 
This procedure to search for the solution of f(x) = 0 is cast into the MATLAB 
routine “falsp()”. 

Note that although the false position method aims to improve the convergence 
speed over the bisection method, it cannot always achieve the goal, especially 
when the curve of f(x) on [a, b\ is not well approximated by a straight line as 
depicted in Fig. 4.3. Figure 4.3b shows how the false position method approaches 
the solution, started by typing the following MATLAB statements, while Fig. 4.3a 
shows the footprints of the bisection method. 

»[x,err,xx] = falsp(f42,1.7,3,1e-4,50) %with initial interval [1.7,3] 


4.4 NEWTON(-RAPHSON) METHOD 

Consider the problem of finding numerically one of the solutions, x°, for a 
nonlinear equation 

f(x) = (x-x°) m g(x) = 0 

where f(x) has (x — x°) m (m is an even number) as a factor and so its curve 
is tangential to the x-axis without crossing it at x = x°. In this case, the signs 
of f(x° — e) and f(x° + e) are the same and we cannot find any interval [a, b\ 
containing only x° as a solution such that f(a)f(b ) < 0. Consequently, brack¬ 
eting methods such as the bisection or false position ones are not applicable to 
this problem. Neither can the MATLAB built-in routine fzero() be applied to 
solve as simple an equation as x 2 = 0, which you would not believe until you try 
it for yourself. Then, how do we solve it? The Newton(-Raphson) method can 
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be used for this kind of problem as well as general nonlinear equation problems, 
only if the first derivative of f(x) exists and is continuous around the solution. 

The strategy behind the Newton(-Raphson) method is to approximate the 
curve of f(x ) by its tangential line at some estimate x k 

y-f(x k ) = f'(x k )(x-x k ) (4.4.1) 


and set the zero (crossing the x-axis) of the tangent line to the next estimate 
x k + 1 . 

0 - f(x k ) = f(x k )(x k+ 1 - x k ) 


f(x k ) 

x k+l =x k - ——- 
f(x k ) 


(4.4.2) 


This Newton iterative formula is cast into the MATLAB routine “newton ()”, 
which is designed to generate the numerical derivative (Chapter 5) in the case 
where the derivative function is not given as the second input argument. 

Here, for the error analysis of the Newton method, we consider the second- 
degree Taylor polynomial (Appendix A) of f(x) about x = x k : 


fix) « f(x k ) + f\x k )ix - x k ) + t-^-(x - x k ) 2 


function [x,fx,xx] = newton(f,df,xO,TolX,MaxIter) 

%newton.m to solve f(x) = 0 by using Newton method. 

%input: f = ftn to be given as a string 'f' if defined in an M-file 
% df = df(x)/dx (If not given, numerical derivative is used.) 

% xO = the initial guess of the solution 

% TolX = the upper limit of |x(k) - x(k-1)| 

% Maxlter = the maximum # of iteration 

%output: x = the point which the algorithm has reached 

% fx = f(x(last)), xx = the history of x 

h = 1e-4; h2 = 2*h; TolFun=eps; 

if nargin == 4 & isnumericfdf), Maxlter = TolX; TolX = xO; xO = df; end 
xx(1) = xO; fx = feval(f,x0); 
for k = 1: Maxlter 

if ~isnumeric(df), dfdx = feval(df,xx(k)); %derivative function 
else dfdx = (feval(f,xx(k) + h)-feval(f,xx(k) - h))/h2; %numerical drv 
end 

dx = -fx/dfdx; 

xx(k+1) = xx(k)+dx; %Eq.(4.4.2) 
fx = feval(f,xx(k + 1)); 

if abs(fx)<TolFun | abs(dx) < TolX, break; end 
x = xx(k + 1); 

if k == Maxlter, fprintf('The best in %d iterations'^ 1 ,Maxlter), end 
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We substitute x = x° (the solution) into this and use f(x°) = 0 to write 
o = f(x°) f(x0 + f(x k )(x° - x k ) + ^y^(x° - x k ) 2 
and 

-/(**) « f'(x k )(x° - x k ) + - x*) 2 

Substituting this into Eq. (4.4.2) and defining the error of the estimate x k 
e k = x k - x°, we can get 


x k +i ^x k + (x° - x k ) + (x° - x k ) 2 , 

2f'(x k ) 

l«*+il « e 2 k = A k e 2 k = \A k e k \\e k \ (4.4.3) 

I -a/ ( x k ) | 

This implies that once the magnitude of initial estimation error |e 0 | is small 
enough to make |Ae 0 | < 1, the magnitudes of successive estimation errors get 
smaller very quickly so long as A k does not become large. The Newton method 
is said to be ‘quadratically convergent’ on account of the fact that the magnitude 
of the estimation error is proportional to the square of the previous estimation 
error. 

Now, it is time to practice using the MATLAB routine “newton ( ) ” for solving 
a nonlinear equation like that dealt with in Example 4.2. We have to type the 
following statements into the MATLAB command window. 

»x0 = 1.8; TolX = 1e-5; Maxlter = 50; %with initial guess 1.8,... 
»[x,err,xx] = newton(f42,x0,1e-5,50) %1 st order derivative 
»df42 = inline(' - (sec(pi-x)). *2-1 ', 'x'); %1 st order derivative 
»[x,err,xx1 ] = newton (f 42, df 42,1,8,1e-5,50) 

Remark 4.3. Newton(-Raphson) Method 

1. While bracketing methods such as the bisection method and the false posi¬ 
tion method converge in all cases, the Newton method is guaranteed to 
converge only in case where the initial value x 0 is sufficiently close to the 
solution x° and A(x) = \f"(x)/2f'(x)\ is sufficiently small for x rs x°. 
Apparently, it is good for fast convergence if we have small A(x )—that is, 
the relative magnitude of the second-order derivative \f r, (x)\ over \f'(x)\ is 
small. In other words, the convergence of the Newton method is endangered 
if the slope of f(x) is too flat or fluctuates too sharply. 

2. Note two drawbacks of the Newton(-Raphson) method. One is the effort 
and time required to compute the derivative f'(x k ) at each iteration; the 
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Figure 4.4 Solving nonlinear equations f(x) = 0 by using the Newton method. 


other is the possibility of going astray, especially when f(x ) has an abruptly 
changing slope around the solution (e.g., Fig. 4.4c or 4.4d), whereas it con¬ 
verges to the solution quickly when f(x ) has a steady slope as illustrated 
in Figs. 4.4a and 4.4b. 


4.5 SECANT METHOD 

The secant method can be regarded as a modification of the Newton method in 
the sense that the derivative is replaced by a difference approximation based on 
the successive estimates 


/'(**) 


f(x k ) - f(x k - 1 ) 
x k - X k -\ 


(4.5.1) 


which is expected to take less time than computing the analytical or numerical 
derivative. By this approximation, the iterative formula (4.4.2) becomes 


x k +i 


f(x k ) 

dfdx k 


with dfdx k = 


f(x k ) - f(x k - 1 ) 


x k — X k — i 


(4.5.2) 
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function [x,fx,xx] = secant(f,xO,TolX,MaxIter,varargin) 

% solve f(x) =0 by using the secant method. 

%input : f = ftn to be given as a string 1 f 1 if defined in an M-file 

% xO = the initial guess of the solution 

% TolX = the upper limit of |x(k) - x(k - 1)| 

% Maxlter = the maximum # of iteration 

%output: x = the point which the algorithm has reached 

% fx f(x(last)), xx = the history of x 

h = 1e-4; h2 = 2*h; TolFun=eps; 

xx(1) = xO; fx = feval(f,xO,varargin{:>); 

for k = 1: Maxlter 

if k <= 1, dfdx = (feval(f,xx(k) + h,varargin{. 

feval(f,xx(k) - h,varargin{:}))/h2; 
else dfdx = (fx - fxO)/dx; 

dx = -fx/dfdx; 

xx(k + 1) = xx(k) + dx; %Eq.(4.5.2) 
fxO = fx; 

fx = feval(f,xx(k+1)); 

if abs(fx) < TolFun | abs(dx) < TolX, break; end 
end 

if k == Maxlter, fprintf('The best in %d iterations'^ 1 ,Maxlter), end 


This secant iterative formula is cast into the MATLAB routine “secant ()”, 
which never needs anything like the derivative as an input argument. We can 
use this routine “secant ()” to solve a nonlinear equation like that dealt with 
in Example 4.2, by typing the following statement into the MATLAB command 
window. The process is depicted in Fig. 4.5. 

»[x,err,xx] = secant(f42,2.5,1e-5,50) %with initial guess 1.8 



Figure 4.5 Solving a nonlinear equation by the secant method. 
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4.6 NEWTON METHOD FOR A SYSTEM OF NONLINEAR EQUATIONS 


Note that the methods and the corresponding MATLAB routines mentioned so 
far can handle only one scalar equation with respect to one scalar variable. In 
order to see how a system of equations can be solved numerically, we rewrite 
the two equations 


fi{x\,x 2 ) = 0 

fl(X\,X 2 ) = 0 


(4.6.1) 


by taking the Taylor series expansion up to first-order about some estimate point 
(xu, x 2 k) as 


9/i I 

fi (xi , x 2 ) = fi (xik , x 2 k) + -r~ 


fi{x\,x 2 ) = f 2 (xu, x 2k ) + — 1 


3*1 \(xik,x n 

dfz I 

l (xu,X2k 


9/i I 

(xi - xi*) + — (x 2 - x 2k ) = 0 

6x2 l(XU,*2t) 

9 A I 

(x\ — x\k) + —— (x 2 — x 2 k) = 0 


dX2 kx tt ,x, 


This can be arranged into a matrix-vector form as 


fi(xi,x 2 ) 
fi(x \, x 2 ) 


l ^ \fi(xu,x 2k )] r 

J _ lf2(Xik,X 2k ) \ [ 

-[?] 


Bfi/dxi 

df 2 /dxi 


9 / i / 9 x 2 11 [xi-xul 

dfx/dxj J \_x 2 -x 2k \ 

(4.6.3) 


which we solve for ( x \, x 2 ) to get the updated vector estimate 

r*u+ii r^ui _ i"9/i/9xi 9 /i/9x 2 ii 1 r/i^u,^)] 

L^2,t:+lJ \_X 2 k\ [df 2 /d Xl Bf 2 /dx 2 \\ (xikX ^ ) Yf 2 {X\k,X 2 k) J 

(4.6.4) 

x^- , | =x k — / t _1 f(x t ) with the Jacobian J k (m, n) = [df m /dx n \| Xjt 


This is not much different from the Newton iterative formula (4.4.2) and is cast 
into the MATLAB routine “newtons ()”. See Eq. (C.9) in Appendix C for the 
definition of the Jacobian. 

Now, let’s use this routine to solve the following system of nonlinear equations 


x \ + 4x^ = 5 
2xj — 2xi — 3x2 = 2.5 


(4.6.5) 


In order to do so, we should first rewrite these equations into a form like 
Eq. (4.6.1) as 

/i(xi, x 2 ) = x\ + 4x| - 5 = 0 

(4.6.6) 

/ 2 (xi, x 2 ) = 2xj — 2xi — 3 x 2 — 2.5 = 0 
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function [x,fx,xx] = newtons(f,xO,TolX,MaxIter,varargin) 

%newtons.m to solve a set of nonlinear eqs f1(x)=0, f2(x)=0,.. 

%input: f = 1"st-order vector ftn equivalent to a set of equations 
% xO the initial guess of the solution 

% TolX = the upper limit of |x(k) - x(k - 1)| 

% Maxlter = the maximum # of iteration 

%output: x = the point which the algorithm has reached 

% fx = f(x(last)) 

% xx the history of x 

h = 1e-4; TolFun = eps; EPS = 1e-6; 
fx = feval(f,xO,varargin{:}); 

Nf = length(fx); Nx = length(xO); 

if Nf -= Nx, error( 1 Incompatible dimensions of f and xO! 1 ); end 
if nargin < 4, Maxlter = 100; end 
if nargin < 3, TolX = EPS; end 

xx(1,:) = xO%Initialize the solution as the initial row vector 
%fxO = norm(fx); %(1) 

for k = 1: Maxlter 

dx = -jacob(f,xx(k,:),h,varargin{:})\fx([dfdx]"-1*fx 
%for 1=1:3 %damping to avoid divergence %(2) 

%dx = dx/2; %(3) 

xx(k + 1,:) = xx(k,:) + dx.'; 

fx = feval(f,xx(k + 1,:),varargin{:}); fxn = norm(fx); 

% if fxn < fxO, break; end %(4) 

%end %(5) 

if fxn < TolFun | norm(dx) < TolX, break; end 
%fxO = fxn; %(6) 

end 

x = xx(k + 1,:); 

if k == Maxlter, fprintf('The best in %d iterations\n',Maxlter), end 


function g = jacob(f,x,h,varargin) %Jacobian of f(x) 
if nargin <3, h = 1e-4; end 

h2 = 2*h; N = length(x); x = x(:).'; I = eye(N); 
for n = 1:N 


9 (: 

end 


,n) = (feval(f,x + I(n,:)*h,varargin{:}) ... 

-feval(f,x - I(n,:)*h,varargin{:})) 1 /h2; 


and convert it into a MATLAB function defined in an M-file, say, “f46.m” 
as follows. 


function y = 

f46(x) 


y (1) = x(1)*x 

(1) + 4* 

x(2)*x(2) - 5; 

y(2) = 2*x(1) 

*x(1)-2* 

x(1)-3*x(2) - 2.5; 


Then, we type the following statements into the MATLAB command window: 

»x0 = [0.8 0.2]; x = newtons('f46',xO) %initial guess [.8 .2] 
x = 2.0000 0.5000 
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-3-2-10 1 2 3 -3-2-10 1 2 3 

(a) Newton method with (x 10 , x 20 ) = (0.8, 0.2) (b) Newton method with (x 10 , x 20 ) = (-1.0, 0.5) 



-5 0 5 -3 -2 -1 0 1 2 3 

(c) Newton method with (x 10 , x 20 ) = (0.5, 0.2) (d) Damped Newton method with 

(Xio,x 20 ) = (0.5,0.2) 

Figure 4.6 Solving the set (4.6.6) of nonlinear equations by vector Newton method. 


Figure 4.6 shows how the vector Newton iteration may proceed depending on 
the initial guess (xio, x 20 ). With (xio, x 2 o) = (0.8, 0.2), it converges to (2, 0.5), 
which is one of the two roots (Fig. 4.6a) and with (jcio, * 20 ) = (— 1, 0.5), it con¬ 
verges to (—1.2065,0.9413), which is another root (Fig. 4.6b). However, with 
(* 10 , x 2 o) = (0.5, 0.2), it wanders around as depicted in Fig. 4.6c. From this figure, 
we can see that the iteration is jumping too far in the beginning and then going 
astray around the place where the curves of the two functions j] (x) and f 2 (x) 
are close, but not crossing. One idea for alleviating this problem is to modify the 
Newton algorithm in such a way that the step size can be adjusted (decreased) to 
keep the norm of f(x^) from increasing at each iteration. The so-called damped 
Newton method based on this idea will be implemented in the MATLAB routine 
“newtons ()” if you activate the six statements numbered from 1 to 6 by deleting 
the comment mark(%) from the beginning of each line. With the same initial guess 
(xio, X 20 ) = (0.5, 0.2) as in Fig. 4.6c, the damped Newton method successfully 
leads to the point (2, 0.5), which is one of the two roots (Fig. 4.6d). 

MATLAB has the built-in function “fsolve (f ,x0)’\ which can give us a 
solution for a system of nonlinear equations. Let us try it for Eq. (4.6.5) or (4.6.6), 
which was already defined in the M-file named ‘f46.m’. 

»x = fsolve('f46',x0,optimset('fsolve')) %with default parameters 
x = 2.0000 0.5000 


4.7 SYMBOLIC SOLUTION FOR EQUATIONS 

MATLAB has many commands and functions that can be very helpful in dealing 
with complex analytic (symbolic) expressions and equations as well as in getting 
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numerical solutions. One of them is “solve () ”, which can be used for obtaining the 
symbolic or numeric roots of equations. According to what we could see by typing 
‘help solve’ into the MATLAB command window, its usages are as follows: 



»[x1 ,x2] = solve ( 1 xl A 2 + 4*x2 A 2 - 5 = 0' , 1 2* x 1 A 2 - 2*x1 - 3*x2-2.5 = O') 
xl = [ 2.] x2 = [ 0.500000] 

[ -1.206459] [ 0.941336] 

[0.603229 -0.392630*i] [-1.095668 -0.540415e-1*i] 

[0.603229 +0.392630*1] [-1.095668 +0.540415e-1*i] 

»S - solve('x A 3 - y A 3 = 2','x = y') ^returns the solution in a structure. 

S = x: [3x1 sym] 
y: [3x1 sym] 



1] 

-1/2H/2*i*3-(1/2)] 
-1/2-1/2*i*3 A (1/2)] 


»[u,v] = solve('a*u~2 + \i~2 = 0','u - v = 1 1 )%regarding u,v as unknowns and a as a parameter 
u = [l/2/(a + 1)*(-2*a + 2*(-a)*(1/2)) + 1] v = [1/2/(a + 1)*(-2*a + 2*(-a)*(1 12) )] 

[1/2/(a + 1)*(-2*a - 2*(-a)“(1/2)) + 1] [1/2/(a + 1)*(-2*a - 2*(-a)*(1/2))] 


^regards only v as a parameter 


Note that in the case where the routine “solve()” finds the symbols more 
than the equations in its input arguments—say, M symbols and N equations with 
M > N —it regards the N symbols closest alphabetically to ‘x’ as variables and 
the other M — N symbols as constants, giving the priority of being a variable to 
the symbol after ‘x’ than to one before ‘x’ for two symbols that are at the same 
distance from ‘x’. Consequently, the priority order of being treated as a symbolic 
variable is as follows: 

x>y>w>z>v>u>t>s>r>q>- 

Actually, we can use the MATLAB built-in function “findsymO” to see the 
priority order. 

»syms xyzqrstuvw %declare 10 symbols to consider 
»findsym(x + y + z*q*r + s + t*u - v - w,10) %symbolic variables? 
ans = x,y,w,z,v,u,t,s,r,q 

4.8 A REAL-WORLD PROBLEM 

Let’s see the following example. 

Example 4.3. The Orbit of NASA’s “Wind” Satellite. One of the previous NASA 
plans is to launch a satellite, called Wind, which is to stay at a fixed position 
along a line from the earth to the sun as depicted in Fig. 4.7 so that the solar 
wind passes around the satellite on its way to earth. In order to find the distance 
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of the satellite from earth, we set up the following equation based on the related 
physical laws as 


-raf = 0 (E4.3.1) 


(a) This might be solved for r by using the (nonlinear) equation solvers like 
the routine ‘newtonsO’ (Section 4.6) or the MATLAB built-in routine 
‘fsolve()’. We define this residual error function (whose zero is to be 
found) in the M-file named “phys.m” and run the statements in the fol¬ 
lowing program “nm4e03.m” as 

xO = 1e6; %the initial (starting) guess 

rn = newtons( 'phys' ,x0,1e-4,100) % newtonsO 

rfs = fsolve('phys',xO,optimset('fsolve')) % fsolve() 

rfsl = fsolve('phys',xO,optimset('MaxFunEvals1000)) %more iterations 

xOI = 1e10 %with another starting guess closer to the solution 

rfs2 = fsolve('phys',x01,optimset('MaxFunEvals',1000)) 

residual_errs = phys([rn rfs rfsl rfs2]) 

which yields 

rn = 1.4762e+011 <with residual error of -1.8908e-016> 

rfs = 5.6811e+007 <with residual error of 4.0919e+004> 

rfsl = 2.1610e+009 <with residual error of 2.8280e+001> 

rfs2 = 1.0000e+010 <with residual error of 1.3203e+000> 


It seems that, even with the increased number of function evaluations and 
another initial guess as suggested in the warning message, ‘fsolveO’ is 
not so successful as ‘newtonsO’ in this case. 
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(b) Noting that Eq. (E4.3.1) may cause ‘division-by-zero’, we multiply both 
sides of the equation by r 2 (R — r) 2 to rewrite it as 

r 3 (R - r) 2 u> 2 - GM S (R - r ) 2 + GM e r 2 = 0 (E4.3.2) 

We define this residual error function in the M-file named “physb.m” and 
run the following statements in the program “nm4e03. m”: 

rnb = newtons('physb',x0) 

rfsb = fsolve('physb',xO,optimset('fsolve')) 

residual_errs - phys([rnb rfsb]) 

which yields 

rnb = 1.4762e+011 <with residual error of 4.3368e-018> 
rfsb = 1.4762e+011 <with residual error of 4.3368e-018> 

Both of the two routines ‘newtons ()’ and ‘f solve ()’ benefited from the 
function conversion and succeeded in finding the solution. 

(c) The results obtained in (a) and (b) imply that the performance of the non¬ 
linear equation solvers may depend on the shape of the (residual error) 
function whose zero they aim to find. Here, we try applying them with 
scaling. On the assumption that the solution is known to be on the order 
of 10 11 , we divide the unknown variable r by 10 11 to scale it down into 
the order of one. This can be done by substituting r = r'/ 10 11 into the 
equations and multiplying the resulting solution by 10 11 . We can run the 
following statements in the program “nm4e03.m”: 

scale = 1 el 1; 

rns = newtonsf'phys',x0/scale,le-6,l00,scale)*scale 

rfss = fsolvef'phys',xO/scale,optimset('fsolve'),scale)*scale 

residual_errs = phys([rns rfss]) 

which yields 

rns = 1.4762e+011 <with residual error of -6.4185e-016> 
rfss = 1.4763e+011 <with residual error of -3.3365e-006> 

Compared with the results with no scaling obtained in (a), the routine 
‘fsolve ()’ benefited from scaling and succeeded in finding the solution, 
(cf) This example implies the following tips for solving nonlinear equations. 

• If you have some preliminary knowledge about the approximate value of 
the true solution, scale the unknown variable up/down to around one and 
then scale the resulting solution back down/up to get the solution to the 
original equation. 

• It might be better for you to apply at least two methods to solve the 
equations as a cross-check. It is suggested to use ‘newtons () ’ together with 
‘f solve])’ for confirming the solution of a system of nonlinear equations. 
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%nm4e03 - astrophysics 
global G Ms Me R T 

G = 6.67e11; Ms = 1.98e30; Me = 5.98e24; 

R = 1.49e11; T = 3.15576e7; w = 2*pi/T; 
xO = 1e6 %initial guess 
format short e 
disp( 1 (a) 1 ) 

rn = newtons('phys',x0) 

rfs = fsolve( 1 phys',x0 ,optimset('fsolve')) 

%fsolve('phys',xO)/fsolve('phys',xO,foptions) in MATLAB 5.x version 
rfsi=fsolve( 1 phys 1 ,xO,optimset('MaxFunEvals',1000)) %more iterations 
%options([2 3 14]) = [1e-4 1e-4 1000]; 

%fsolve]'phys',x0,options) in MATLAB 5.x 

xOI = 1e10; %with another starting guess closer to the solution 
rfs2 = fsolve]'phys',x01,optimset('MaxFunEvals', 1000)) 
residual_errs = phys([rn rfs rfsl rfs2]) 
disp] 1 (b) 1 ) 

rnb = newtons]'physb 1 ,x0) 

rfsb = fsolve]'physb',xO,optimset('fsolve')) 

residual_errs = phys([rnb rfsb]) 

disp]'(c)') 

rns = newtons]'phys',xO/scale,1e-6,100,scale)*scale; 

rfss = fsolve]'phys',xO/scale,optimset]'fsolve') ,scale)*scale 

residual_errs = phys][rns rfss]) _ 


function f = phys(x,scale); 
if nargin < 2, scale = 1; end 
global G Ms Me R T 

w = 2*pi/T; x = x*scale; f = G*(Ms/(x.~2 + eps) - Me./((R - x).~2 + eps))-x*w~2; 
function f = physb(x,scale); 
if nargin < 2, scale = 1; end 
global G Ms Me R T 

w = 2*pi/T; x = x*scale; f = (R-x)."2.*(w~2*x."3 - G*Ms) + G*Me*x."2; 


PROBLEMS 

4.1 Fixed-Point Iterative Method 

Consider the simple nonlinear equation 

f(x)=x 2 - 3% +1=0 (P4.1.1) 

Knowing that this equation has two roots 

x° = 1.5 ± VL25 w 2.6180 or 0.382; x° l » 0.382 

investigate the practicability of the fixed-point iteration. 

(a) First consider the following iterative formula: 

Xk+1 = ga(x k ) = l(xl + 1) (P4.1.3) 


, * o2 « 2.6180 

(P4.1.2) 





198 NONLINEAR EQUATIONS 



Noting that the first derivative of this iterative function g a (x) is 

8a(x) = (P4.1.4) 

determine which solution attracts this iteration and certify it in Fig. 
P4.1a. In addition, run the MATLAB routine “fixpt()” to perform 
the iteration (P4.1.3) with the initial points xq = 0, jcq = 2, and xo = 3. 
What does the routine yield for each initial point? 

(b) Now consider the following iterative formula: 

x k+i = 8b(x k ) = 3 —— (P4.1.5) 

x k 

Noting that the first derivative of this iterative function g b (x) is 

g' b (x) *= -i (P4.1.6) 

determine which solution attracts this iteration and certify it in Fig. P4. lb. 
In addition, run the MATLAB routine “f ixpt () ” to carry out the itera¬ 
tion (P4.1.5) with the initial points x 0 = 0.2, x 0 = 1, and x 0 = 3. What 
does the routine yield for each initial point? 

(cf) This illustrates that the outcome of an algorithm may depend on the start¬ 
ing point. 
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4.2 Bisection Method and Fixed-Point Iteration 

Consider the nonlinear equation treated in Example 4.2. 

f(x) = tan(jr - x) - x = 0 (P4.2.1) 

Two graphical solutions of this equation are depicted in Fig. P4.2, which 
can be obtained by typing the following statements into the MATLAB 
command window: 

»ezplot('tan(pi-x) 1 ,-pi/2,3*pi/2) 

»hold on, ezplot('x+0',-pi/2,3*pi/2) 


(a) In order to use the bisection method for finding the solution between 
1.5 and 3, Charley typed the statements shown below. Could he get the 
right solution? If not, explain him why he failed and suggest him how 
to make it. 

»fp42 = inlinef'tan(pi-x)-x','x'); 

»TolX = 1 e-4; Maxlten = 50; 

»x = bisct(fp42,1.5,3,TolX,Maxlten) 

(b) In order to find some interval to which the bisection method is applica¬ 
ble, Jessica used the MATLAB command “findf)” as shown below. 

»x = [0: 0.5: pi]; y = tan(pi-x) - x; 

»k = find(y(1:end-1).*y(2:end) < 0); 

»[x(k) x(k + 1); y(k) y(k + 1)] 

ans = 1.5000 2.0000 2.0000 2.5000 

-15.6014 0.1850 0.1850 -1.7530 

This shows that the sign of f(x) changes between x = 1.5 and 2.0 
and also between x = 2.0 and 2.5. Noting this, Jessica thought that she 
might use the bisection method to find a solution between 1.5 and 2.0 
by typing the following command. 

»x=bisct(fp42,1.5,2,TolX,MaxIter) 

Check the validity of the solution—that is, check if f(x) = 0 or not—by 
typing 

»f P 42(x) 

If her solution is not good, explain the reason. If you are not sure about 
it, you can try plotting the graph in Fig. P4.2 by typing the following 
statements into the MATLAB command window. 

»x = [-pi/2+0.05:0.05:3*pi/2 - 0.05]; 

»plot(x,tan(pi - x),x,x) 
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(cf) This helps us understand why fzero(fp42,1.8) leads to the wrong solu¬ 
tion even without any warning message as mentioned in Example 4.2. 

(c) In order to find the solution around x = 2.0 by using the fixed-point 
iteration with the initial point Xq = 2.0, Vania defined the iterative func¬ 
tion as 

»gp421 = inlinef 1 tan (pi - x) 1 , 1 x 1 ); %x = gi(x) = tan (jr - x) 

and typed the following statement into the MATLAB command window. 

»x = fixpt(gp421,2,TolX,MaxIter) 

Could she reach the solution near 2? Will it be better if you start the 
routine with any different initial point? What is wrong? 

(d) Itha, seeing what Vania did, decided to try with another iterative formula 

tan -1 x = n, x = g 2 (x) = 7t — tan -1 x (P4.2.2) 

So she defined the iterative function as 

»gp422 = inline( 1 pi-atan(x)', 'x'); %x = g(x) = n — tan _1 (x) 

and typed the following statement into the MATLAB command window: 

»x = fixpt(gp422,2,TolX,MaxIter) 

What could she get? Is it the right solution? Does this command work 
with different initial value, like 0 or 6, which are far from the solution 
we want to find? Describe the difference between Vania’s approach and 
Itha’s. 
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4.3 Recursive (Self-Calling) Routine for Bisection Method 

As stated in Section 1.3, MATLAB allows us to make nested (recursive) rou¬ 
tines which call itself. Modify the MATLAB routine “bisct ()” (in Section 
4.2) into a nested routine “bisct_r ()” and run it to solve Eq. (P4.2.1). 

4.4 Newton Method and Secant Method 

As can be seen in Fig. 4.5, the secant method introduced in Section 4.5 
was devised to remove the necessity of the derivative/gradient and improve 
the convergence. But, it sometimes turns out to be worse than the Newton 
method. Apply the routines “newton()” and “secant()” to solve 

fpAA(x) = x 3 -x 2 -x+\=0 (P4.4) 

starting with the initial point xo = —0.2 one time and xo = —0.3 for another 
shot. 

4.5 Acceleration of Aitken-Steffensen Method 


A sequence converging to a limit x° can be described 
x° - x k+i = e k+ i & Ae k = A(x° - x k ) 

° ~ x k +i 


with lim - 


° -x k 


A(|A| < 1) (P4.5.1) 


In order to think about how to improve the convergence speed of this 
sequence, we define a new sequence p k as 


- X/c+l , 


-X k 


; (x° - x k+1 )(x° - x k —\) flg (x° - x k f 


x° -x k x° - X k —l 

(x°) 2 - x k+ ix° - x k -ix° + x k+1 x k -i « (x°) 2 - 2x°x k + ^ 
O _ X k+ iX k -l - x\ _ 
x k +i — 2x k + x k -\ 

(a) Check that the error of this sequence p k is as follows. 
x k+ ix k -i - x\ 


(P4.5.2) 


X - p k = X - 


x k+1 - 2x k + x k -i 

x k -i(x k+ i - 2x k + x k -i) - x\_ x + 2x k - X x k - x\ 
x k +\ — 2x k + x k -i 
(x k - X k —i) 2 

x k -i + -«-:- 

x k +l — £x k + X k -1 

(~(x° - x k ) + (x° - X k -\)) 2 

x k ~\ - 

~(x° - x k+ i) + 2(x° - x k ) - (x° - x k -i) 


( -A2 + 2A-l)(tf’-jc*_ 1 ) ' 
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Table P4.5 Comparison of Various Methods Applied for Solving Nonlinear 
Equations 




Newton 

Secant 

Steffensen 

Schroder 

fzero() 

fsolvef) 

x 0 = 1.6 

/42 

X 

2.0288 






fix) 


1.19e-8 




1.72e-9 

Flops 

158 

112 

273 

167 

986 

1454 

T 0 = 0 

fp44 

T 



1.0000 




fix) 







Flops 

53 

30 

63 

31 

391 

364 

T 0 = 0 

fp45 

T 



5.0000 


NaN 


fix) 





NaN 


Flops 

536 

434 

42 

19 

3683 

1978 


(cf) Since the flops () command is no longer available in MATLAB 6.x version, the numbers of 
floating-point operations are obtained from MATLAB 5.x version so that the readers can compare 
the various algorithms in terms of their computational loads. 


(b) Modify the routine “newton () ” into a routine “stf ns () ” that generates 
the sequence (P4.5.2) and run it to solve 

f 42 (x) = tan(7T — x) — x = 0 (with x 0 = 1.6) (P4.5.4) 

f p44 (x) = x 3 -x 2 -x + l = 0 (with x 0 = 0) (P4.5.5) 

fp 45 (X) = (JC - 5) 4 = 0 (with A 0 = 0) (P4.5.6) 

Fill in Table P4.5 with the results and those obtained by using the 
routines “newton()”, “secant()” (with the error tolerance TolX = 
10 -5 ), “fzero()’\ and “fsolveO”. 

4.6 Acceleration of Newton Method for Multiple Roots: Schroder Method 

In order to improve the convergence speed, Schroder modifies the Newton 
iterative algorithm (4.4.2) as 


„ /(**) 

**+i =x k - M 

f'(Xk) 

with M : the order of multiplicity of the root 


(P4.6.1) 
want to find 


Based on this idea, modify the routine “newton()” into a routine 
“schroder()” and run it to solve Eqs. (P4.5.4.6). Fill in the corresponding 
blanks of Table P4.5 with the results. 
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4.7 Newton Method for Systems of Nonlinear Equations 


Apply the routine “newtons()” (Section 4.6) and the MATLAB built-in 
routine “f solve ()” (with [xO yO] = [1 0.5]) to solve the following systems 
of equations. Fill in Table P4.7 with the results. 


(a) 

x 2 + y 2 = l 
x 2 -y = 0 

(P4.7.1) 

(b) 

5cos6>i + 6cos(#i + 0 2 ) = 10 

5sin6>i + 6sin(6*i + d 2 ) = 4 

(P4.7.2) 

(c) 

3x 2 + 4y 2 = 3 
x 2 + y 2 = V3/2 

(P4.7.3) 

(d) 

x\ + 10xi - *2 = 5 

X\+x\— 10x 2 = -1 

(P4.7.4) 

(e) 

x 2 - V3xy + 2y 2 = 10 

4x 2 + 3y/3xy + y = 22 

(P4.7.5) 

(f) 

x 3 y-y-2x 3 = -16 
x-y 2 = - 1 

(P4.7.6) 

(g) 

a: 2 + 4y 2 = 16 
xy 2 = 4 

(P4.7.7) 

(h) 

xe y - x 5 + y = 3 
x + y + tan x — sin y = 0 

(P4.7.8) 

(i) 

2 log y - x = 0 
xy-y = 1 

(P4.7.9) 

<J) 

12 xy — 6x = -1 

60x 2 - 180* 2 y - 30xy = 1 

(P4.7.10) 


4.8 Newton Method for Systems of Nonlinear Equations 


Apply the routine “newtons()” (Section 4.6) and the MATLAB built-in 
routine “fsolveO” (with [xO yO zO] = [1 1 1]) to solve the following 

systems of equations. Fill in Table P4.8 with the results. 


(a) 

xyz = -1 



x 2 + 2 y 2 + 4 z 2 =7 

2x 2 + y 3 + 6z = 1 

(P4.8.1) 

(b) 

xyz = 1 



jc 2 + 2y 3 + z 2 = 4 
v + 2 y 2 - z 3 = 2 

(P4.8.2) 

(c) 

^2 + 4 y 2 + 9z 2 = 34 



v 2 + 9y 2 - 5z = 40 
x 2 z — y — 7 

(P4.8.3) 

(d) 

x 1 4-2 sin(y 7 r/ 2 ) + z 2 = 0 



-2xy + z = 3 
- z 2 = 0 

(P4.8.4) 
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Table P4.8 Applying newtons () f solve () for Systems of Nonlinear Equations 




newtons() 

fsolve() 

xo = [l 1 1] 

(P4.8.1) 

- 

[1.0000 -1.0000 1.0000] 


ll/toll 

1.1102e-16 (1.1102e-16) 


Flops 

8158 

12964 

xo = [l 1 1] 

(P4.8.2) 

x 


[1 1 1] 

ll/toll 


0 

Flops 

990 

854 

xo = M 1 1] 

(P4.8.3) 

X 



II/(X)|| 



Flops 

6611 

4735 

xo=[l 1 1] 

(P4.8.4) 

x 

[1.0000 -1.0000 1.0000] 


II/(X)|| 

4.5506e-15 (4.6576e-15) 


Flops 

18,273 

21,935 

xo = [l 1 1] 

(P4.8.5) 

X 



II/(X)|| 



Flops 

6811 

5525 

xo=[l 1 1] 

(P4.8.6) 

x 


[2.0000 1.0000 3.0000] 

II/(X)|| 


3.4659e-8 (2.6130e-8) 

Flops 

6191 

4884 

xo = [l 1 1] 

(P4.8.7) 

X 

[1.0000 3.0000 2.0000] 


II/(X)|| 

1.0022e-13 (1.0437e-13) 


Flops 

8055 

6102 


;e) x 2 + y 2 + z 2 = 14 

x 2 + 2y 2 -z = 6 (P4.8.5: 

x-3y 2 + z 2 = -2 
(f> x 3 -12y + z 2 = 5 
3x 2 + y 3 -2z = 7 
x + 24y 2 - 2 sin(^rz/18) = 25 


(P4.8.6; 
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(g) x 2 + y 2 -2 z = 6 

x 2 -2y + z 3 = 3 (P4.8.7) 

2xz - 3 y 2 -z 2 = -27 


4.9 Newton Method for a System of Nonlinear Equations with Varying Para¬ 
meters) 

In order to find the average modulation order x,- for each user of an OFDM 
(orthogonal frequency division multiplex) system that has /V(128) subcha¬ 
nnels to assign to each of the four users in the environment of noise power 
No and the bit error rate (probability of bit error) P e , a communication 
system expert, Mi-hyun, formulated the problem into the system of five 
nonlinear equations as follows: 


fi(x) = (2 X ‘(xj In2 — 1) + l)y2(erfc _1 (P e /2)) 2 — A. = 0 (P4.9.1) 

for i = 1,2, 3, 4 

/ 5 (x) = J2~~N = 0 (P4.9.2) 

where N = 128 and a, is the data rate of each user 


where erfc 1 (x) is the inverse function of the complementary error function 


erfc(x) = —— f e ,2 dt = 1-— [ e ' 2 dt = 1 — erf(x) (P4.9.3) 

y/n J x ^Jn Jo 

and defined as the MATLAB built-in function ‘erfcinv()\ She defined 
the mismatching error (vector) function as below and save it in the M-file 
named “fp_bits.m”. 


function y = fp_bits(x,a,Pe) 

%x(i),i = 1:4 correspond to the modulation order of each user 
%x(5) corresponds to the Lagrange multiplier (Lambda) 
if nargin <3, Pe = 1e-4; 

if nargin < 2, a = [64 64 64 64]; end 
end 

N = 128; NO = 1; 
x14 = x(1:4); 

y = (2. A x14.*(log(2)*x14 - 1)+1)*N0/3*2*erfcinv(Pe/2).'2 - x(5); 
y(5) = sum(a./x14) - N; 


Compose a program which solves the above system of nonlinear equations 
(with Nq = 1 and P e = I0 -4 ) to get the modulation order x, of each user 
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for five different sets of data rates 

a = [32 32 32 32], [64 32 32 32], [128 32 32 32], [256 32 32 32], and [512 32 32 32] 

and plots a\/x\ (the number of subchannels assigned to user 1) versus ct\ 
(the data rate of user 1). 

4.10 Temperature Rising from Heat Flux in a Semi-infinite Slab 

Consider a semi-infinite slab whose temperature rises as a function of posi¬ 
tion x > 0 and time t > 0 as 

T{x, t) = % ( - erfc(s) ) with s 2 = x 2 /4 at (P4.10.1) 

K \ y/ltS I 

where the function erfc() is defined by Eq. (P4.9.3) and 

Q (heat flux) = 200 J/m 2 s, k (conductivity) = 0.015 J/m/s/°C, 
a (diffusivity) = 2.5 x 10 -5 m 2 /s 

In order to find the heat transfer speed, a heating system expert, Kyung- 
won, wants to solve the above equation to get the positions x(t ) with a 
temperature rise of T = 30 °C at t = 10:10:200 s. Compose the program 
which does this job and plots x(t) versus l. 

4.11 Damped Newton Method for a Set of Nonlinear Equations 

Consider the routine “newtons ()”, which is made for solving a system of 
equations and introduced in Section 4.6. 

(a) Run the routine with the initial point (x\q, X 20 ) = (0.5, 0.2) to solve 
Eq. (4.6.5) and certify that it does not yield the right solution as depicted 
in Fig. 4.6c. 

(b) In order to keep the step size adjusted in the case where the norm of the 
vector function f(x*. + i) at iteration k + 1 is larger than that of f(x/ ( ) at 
iteration k, insert (activate) the statements numbered from 1 to 6 of the 
routine “newtons () ” (Section 4.6) by deleting the comment mark (%) at 
the beginning of each line to make a modified routine “newtonds()”, 
which implements the damped Newton method. Run it with the initial 
point (xio, X 20 ) = (0.5, 0.2) to solve Eq. (4.6.5) and certify that it yields 
the right solution as depicted in Fig. 4.6d. 

(c) Run the MATLAB built-in routine “fsolveO” with the initial point 
(xio, X 20 ) = (0.5, 0.2) to solve Eq. (4.6.5). Does it present you a right 
solution? 
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5.1 DIFFERENCE APPROXIMATION FOR FIRST DERIVATIVE 


For a function fix) of a variable x, its first derivative is defined as 


f'(x) = lim 


fjx+h)-fjx) 

h 


(5.1.1) 


However, this gives our computers a headache, since they do not know how 
to take a limit. Any input number given to computers must be a definite num¬ 
ber and can be neither too small nor too large to be understood by the com¬ 
puter. The ‘theoretically’ infinitesimal number h involved in this equation is a 
problem. 

A simple approximation that computers might be happy with is the forward 
difference approximation 


D fl (x, h ) = fix + h) h —— (h is step size) (5.1.2) 

How far away is this approximation from the true value of (5.1.1)? In order to do 
the error analysis, we take the Taylor series expansion of f(x + h) about a: as 

h 2 h 3 

f{x + h) = fix) + hfix) + -f a \x) + ^/ (3) M + • ‘ • (5-1.3) 
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Subtracting f(x) from both sides and dividing both sides by the step size h 
D f i(x, h ) = /(X + ^~ /W = f(x) + \f 2 \x ) + ^/ (3) W + ‘ 


= fix) + 0(h) 


(5.1.4) 


where 0(g(h)), called ‘big Oh of g(h)’, denotes a truncation error term propor¬ 
tional to g(h) for \h\ < 1. This means that the error of the forward difference 
approximation (5.1.2) of the first derivative is proportional to the step size h, or, 
equivalently, in the order of h. 

Now, in order to derive another approximation formula for the first derivative 
having a smaller error, let’s remove the first-order term with respect to h from 
Eq. (5.1.4) by substituting 2 h for h in the equation 


and subtracting this result from two times the equation. Then, we get 

, n . ,, n , „f(x+h)-f(x) f(x+2h)-f(x) 

2Dfi(x, h) — Dfi(x, 2h) = 2 - 


Df 2 (x, h) : 


2D fl (x,h)~ D fl (x,2h) 

' 2 - 1 

_ -f(x + 2h) + 4f(x + h)-3f(x) 


which can be regarded as an improvement over Eq. (5.1.4), since it has the 
truncation error of 0(h 2 ) for \h\ ~< 1. 

How about the backward difference approximation? 


D bl (x,h) 


fix) - f(x - h) 
h 


D f i(x, —h) 


(h is step size) (5.1.6) 


This also has an error of 0(h) and can be processed to yield an improved version 
having a truncation error of 0(h 2 ). 


D b2 (x, h) 


2D bl (x, h) - D bl (x , 2h) 3 f(x) - 4f(x - h) + f(x - 2h) 

2-1 “ 2 h 

= f(x)+0(h 2 ) (5.1.7) 


In order to derive another approximation formula for the first derivative, we 
take the Taylor series expansion of f(x + h) and fix — h) up to the fifth order 
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to write 

f(x + h) = f(x ) + hf'ix) + y/ (2) « + y/ (3) (x) + y f W ix) + y/ (5) « + ••• 
fix -h) = fix) - hf'ix) + y/ (2) (x) - y/ (3) W + y/ W (x) - y/ (5) « + • • • 


and divide the difference between these two equations by 2h to get the central 
difference approximation for the first derivative as 


_ fix + h)~ fix - h) _ 


= f\x) + Oih 2 ) (5.1.8) 


which has an error of Oih 2 ) similarly to Eqs. (5.1.5) and (5.1.7). This can also be 
processed to yield an improved version having a truncation error of Oih 4 ). 


2 2 D e2 (x, h) — D c2 (x, 2h) = 4- 


fix + h) - fix - h) fix + 2 h) - fix - 2 h) 


D c4 ix, h) : 


= 3 fix)-^-f^ix)-.. 
2 2 D cl ix,h) - D c \ix, 2h) 


2 2 — 1 

8 fix + h)~ 8 fjx -h)- fjx + 2 h) + fjx-2h) 


= fix) + Oih 4 ) 


(5.1.9) 


Furthermore, this procedure can be formularized into a general formula, called 
‘Richardson’s extrapolation’, for improving the difference approximation of the 
derivatives as follows: 


<Richardson’s extrapolation> 


„ , ,, 2 n D f Ax,h)-D Ln ix,2h) 
D f , n +iix, h) = -- 2 „ _ ^ - 

„ ,. 2 n D h , n ix,h)- D Kn (x,2h) 

D b , n +lix, h) = -yyy- 


in: the order of error) 


(5.1.10a) 

(5.1.10b) 


„ , 2 2n D c ^ix,h)-D can ix,2h) 

D c .2(n+\)(X, h) = -y—- 


(5.1.10c) 


5.2 APPROXIMATION ERROR OF FIRST DERIVATIVE 

In the previous section, we derived some difference approximation formulas 
for the first derivative. Since their errors are proportional to some power of 
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the step-size h, it seems that the errors continue to decrease as h gets smaller. 
However, this is only half of the story since we considered only the truncation 
error caused by truncating the high-order terms in the Taylor series expansion 
and did not take account of the round-off error caused by quantization. 

In this section, we will discuss the round-off error as well as the truncation 
error so as to gain a better understanding of how the computer really works. For 
this purpose, suppose that the function values 

f{x + 2 ft), fix + h), fix), fix - h ), fix - 2/0 
are quantized (rounded-off) to 

T 2 = fix + 2/0 + e 2 , yi = fix + h) + e i 
yo = fix) + e 0 (5.2.1) 

y_i = fix - h) + , y _ 2 = fix - 2h) + e _ 2 


where the magnitudes of the round-off (quantization) errors e 2 , e\, eo, e_i, and 
e_ 2 are all smaller than some positive number s, that is, |e, | < e. Then, the total 
error of the forward difference approximation (5.1.4) can be derived as 


D„ (,,„)= = /(* + » + «■-/(*)-«. m + fiZS + 

\Df,(x,h)- f'(x)\ < I + ^~r+ with K, = / <2> (.r) 


Look at the right-hand side of this inequality—that is, the upper bound of error. 
It consists of two parts; the first one is due to the round-off error and in inverse 
proportion to the step-size h, while the second one is due to the truncation error 
and in direct proportion to h. Therefore, the upper bound of the total error can 
be minimized with respect to the step-size h to give the optimum step-size h a as 


d_ / 2 e 
dh V h 




2e 



(5.2.2) 


Thetotal error of the central difference approximation (5.1.8) can also be derived 
as follows: 


fix + h) + e\ — fix — h) — e_i 


\D cl ix,h)-f'ix)\ < 


fix) 

\e\-e. 


0 


i + ^<| + ^ withX2 = / C3, W 
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The right-hand side of this inequality is minimized to yield the optimum step 
size h a as 


d_ 

dh 



~— + ^h=0, 



(5.2.3) 


Similarly, we can derive the total error of the central difference approximation 
(5.1.9) as 


I D c4 (x,h)-f(x)\ < 


8ei — 8e_i — e2 + e_2 I 


1 * 41,4 

■lo-* 


and find out the optimum step s 


d (3s |* 4 | 4 \ 

dh \2h + 30 1 ) 


3s 

2^2 


2\K 4 \ 


h 3 = 0, 



(5.2.4) 


From what we have seen so far, we can tell that, as we make the step size h 
smaller, the round-off error may increase, while the truncation error decreases. 
This is called ‘step-size dilemma’. Therefore, there must be some optimal step 
size h„ for the difference approximation formulas, as derived analytically in 
Eqs. (5.2.2), (5.2.3), and (5.2.4). However, these equations are only of theoretical 
value and cannot be used practically to determine h a because we usually don’t 
have any information about the high-order derivatives and, consequently, we 
cannot estimate K i, Ki. .... Besides, noting that h„ minimizes not the real error, 
but its upper bound, we can never expect the true optimal step size to be uniform 
for all * even with the same approximation formula. 

Now, we can verify the step-size dilemma and the existence of some optimal 
step size h„ by computing the numerical derivative of a function, say, f(x) = 
sin*, whose analytical derivatives are well known. To see how the errors of the 
difference approximation formulas (5.1.4) and (5.1.8) depend on the step size h, 
we computed their values for * = it/A together with their errors as summarized 
in Tables 5.1 and 5.2. From these results, it appears that the errors of (5.1.4) and 
(5.1.8) are minimized with h ~ 10 8 and h <*> 10 -5 , respectively. This may be 
justified by the following facts: 


Noting that the number of significant bits is 52, which is the number of man¬ 
tissa bits (Section 1.2.1), or, equivalently, the number of significant digits 
is about 52 x 3/10 fs 16 (since 2 10 ~ 10 3 ), and the value of /(*) = sin* is 
less than or equal to one, the round-off error is roughly 


e 10 _16 /2 
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Table 5.1 The Forward Difference Approximation (5.1.4) for the First Derivative of f(x) = 
sinx and Its Error from the True Value (cos ir/4 = 0.7071067812) Depending on the Step 
Size h 


h k = 10“* 


On — -Die*—1) 

~ cos(;r/4) 

lii =0.1000000000 0.6706029729 

h 2 = 0.0100000000 0.7035594917 0.0329565188 

h 2 = 0.0010000000 0.7067531100 0.0031936183 

h 4 = 0.0001000000 0.7070714247 0.0003183147 

h 5 = 0.0000100000 0.7071032456 0.0000318210 

h() = 0.0000010000 0.7071064277 0.0000031821 

h 7 = 0.0000001000 0.7071067454 0.0000003176 

h B = 0.0000000100* 0.7071067842 0.0000000389 

h 9 = 0.0000000010 0.7071068175 0.0000000333* 

hi 0 = 0.0000000001 0.7071077057 0.0000008882 

h„ = 0.0000000168 (the optimal value of h obtained from Eq. 

-0.03650380828 

-0.00354728950 

-0.00035367121 

-0.00003535652 

-0.00000353554 

-0.00000035344 

-0.00000003581 

0.00000000305* 

0.00000003636 

0.00000092454 

(5.2.2)) 

Table 5.2 The Forward Difference Approximation (5.1.8) for the First Derivative of f(x) = 
sinx and Its Error from the True Value (cosir/4 = 0.7071067812) Depending on the Step 
Size h 

h k = 10-' 

D2k\x=nH 

D 2k ~ D 2(k -i) 

D 2k \x=n/4 ~ COS(^/4) 

hi =0.1000000000 0.7059288590 

h 2 = 0.0100000000 0.7070949961 0.0011661371 

/i 3 = 0.0010000000 0.7071066633 0.0000116672 

/i 4 = 0.0001000000 0.7071067800 0.0000001167 

h 5 = 0.0000100000* 0.7071067812 0.0000000012 

h 6 = 0.0000010000 0.7071067812 0.0000000001* 

h 7 = 0.0000001000 0.7071067804 -0.0000000009 

h s = 0.0000000100 0.7071067842 0.0000000039 

hg = 0.0000000010 0.7071067620 -0.0000000222 

h w = 0.0000000001 0.7071071506 0.0000003886 

h a = 0.0000059640 (the optimal value of h obtained from Eq. 

-0.00117792219 

-0.00001178505 

-0.00000011785 

-0.00000000118 

-0.00000000001* 

0.00000000005 

-0.00000000084 

0.00000000305 

-0.00000001915 

0.00000036942 

(5.2.3)) 


Accordingly, Eqs. (5.2.2) and (5.2.3) give the theoretical optimal values of 
step size h as 


f\Ki\ 2 /^ 


(tt/4) | 


\K 2 \ 


10~ 16 /2 
| -sin(jr/4)| ' 


3 x 10- 16 /2 


|/ (3) (jt/4)| V | - cos(tt/4)| 


: 0.5964 x 1(T 5 
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Figure 5.1 Forward/central difference approximation error of first derivative versus step size h. 


Figure 5.1a/b shows how the error bounds of the difference approximations 
(5.1.4)/(5.1.8) for the first derivative vary with the step-size h, implying that there 
is some optimal value of step-size h with which the error bound of the numerical 
derivative is minimized. It seems that we might be able to get the optimal step- 
size h a by using this kind of graph or directly using Eq. (5.2.2),(5.2.3) or (5.2.4). 
But, as mentioned before, it is not possible, as long as the high-order derivatives 
are unknown (as is usually the case). Very fortunately, Tables 5.1 and 5.2 sug¬ 
gest that we might be able to guess the good value of h by watching how 
small |Dik — D i(t _i)| is for a given problem. On the other hand, Fig. 5.2a/b 
shows the tangential lines based on the forward/central difference approximations 
(5.1.4)/(5.1.8) of the first derivative at x = 7t/4 with the three values of step- 
size h. They imply that there is some optimal step-size h a and the numerical 
approximation error becomes larger if we make the step-size h larger or smaller 
than the value. 




(a) Forward difference approximation by Eq. (5.1.4) (b) Central difference approximation by Eq. (5.1.8) 


Figure 5.2 Forward/central difference approximation of first derivative of f(x) = sinx. 
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5.3 DIFFERENCE APPROXIMATION FOR SECOND 
AND HIGHER DERIVATIVE 

In order to obtain an approximation formula for the second derivative, we take 
the Taylor series expansion of fix + h) and f(x — h) up to the fifth order to 
write 

fix +h)= f{x) + hf'ix) + y/ (2) (x) + y/ (3) « + y-/ (4) « + y/ (5) « + • ■ ■ 
fix - h) = fix) - hf'ix) + y f 2 \x) - y f 0) ix) + y / (4) « - y f 5 \x) + • ■ ■ 

Adding these two equations (to remove the fix) terms) and then subtracting 
2 fix) from both sides and dividing both sides by h 2 yields the central difference 
approximation for the second derivative as 

d (2) (x h) = f(x + h)-2fjx) + fjx-h) 

= f 2 \x) + ^f (4 \x) + p (6) ix) + ■■■ (5.3.1) 

which has a truncation error of Oih 2 ). 

Richardson’s extrapolation can be used for manipulating this equation to 
remove the h 2 term, which yields an improved version 

2 2 D® (x, h) - D™(x, 2 h) _ -fix + 2 h) + 16 fix + h) - 30 fix) + 16 fix - h) - fix - 2 h) 

2 2 - 1 “ m 2 

= /<2>(x) " ^o /<5>(x) + " ' 

d ( 2 )(x h) _ -fix + 2 h) + 16/(x + h)~ 30 fix) + 16/(x - h) - fjx - 2 h) 

= / <2) (x) + 0(/ l 4 ) (5.3.2) 

which has a truncation error of Oih 4 ). 

The difference approximation formulas for the first and second derivatives 
derived so far are summarized in Table 5.3, where the following notations are 
used: 

D^/D^/D^f is the forward/backward/central difference approximation for 
the Ath derivative having an error of Oih l )ih is the step size) 
h = fix + kh) 
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Now, we turn our attention to the high-order derivatives. But, instead of deriv¬ 
ing the specific formulas, let’s make an algorithm to generate whatever difference 
approximation formula we want. For instance, if we want to get the approxima¬ 
tion formula of the second derivative based on the function values / 2 , /i, /o, /-i, 
and f- 2 , we write 

DSu, h) = ^ + c -/'+ c °/° + c --/-' UtlLl ( 5 . 3 , 3 ) 

h 1 


and take the Taylor series expansion of / 2 , f\, /_i, and /_ 2 excluding f 0 on the 
right-hand side of this equation to rewrite it as 


D c4 (x, h) 


( , , , (2 h) 1 m (2 h) 3 ( 3j , (2 h) 4 f(4) \ 

C2 ^/o + 2 hf 0 -I-— /o + -yp/ 0 + fo 4- J 

+ c i (/o + hfv + — / 0 (2> + y/ 0 (3) + 4-^ + c o/o 


(C2 + Cl + Co + C_1 + C_ 2 )/o + ^(2 c 2 + Cl — C_1 — 2 C_ 2 )/q 

+/j 2 (yC 2 + ^C, + y-_i + yc_ 2 j /o (2) 

+;, 5 (l C2 + Ti c '-T! c -'-| c - ! ) / » 3> 

+h {v C2+ V Cl + V C - 1 + 4\ C - 2 ) fo + "' 


We should solve the following set of equations to determine the coefficients 
c 2 , ci, co, c_i, and c_ 2 so as to make the expression conform to the second 
derivative / 0 (2> at x + 0 h = x. 


' 1 

1 

1 

1 

1 " 

" c 2 ' 


-0- 

2 

1 

0 

-1 

-2 

Cl 


0 

2 2 /2! 

1/2! 

0 

1/2! 

2 2 /2! 

co 

= 

1 

2 3 /3! 

1/3! 

0 

-1/3! 

—2 3 /3! 

C-l 


0 

_2 4 /4! 

1/4! 

0 

1/4! 

2 4 /4! _ 

-C-2- 


_ 0 _ 


(5.3.5) 
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Table 5.3 The Difference Approximation Formulas for the First and Second Derivatives 

0(h) forward difference approximation for the first derivative: 

D f i(x,h)=£jj£. (5.1.4) 

O (h 2 ) forward difference approximation for the first derivative: 


D fl {x,h) = 


2D fl {x,h)-D fl {x,2h) _ -/ 2 + 4/i-3/q 


0(h) backward difference approximation for the first derivative: 

D b i(x,h)= / °~/~ 1 
h 

0(h 2 ) backward difference approximation for the first derivative: 

2D bl (x, h)-D b i(x,2h) _ 3/q - 4/_i + /_ 2 


h) = 


2-1 : 
0(h 2 ) central difference approximation for the first derivative: 

d ^-» = — 2 r 2 

O (h 4 ) forward difference approximation for the first derivative: 


2 2 - 1 12 
O (h 2 ) central difference approximation for the second derivative: 

f X - 2/o + /_! 


D£(x, h) = 


h 2 


O ( h 4 ) forward difference approximation for the second derivative: 

( 2), 2 2 D™{x,h)-D%(x,2h) ~/ 2 + 16/j - 30/p + I6/-1 -/- 2 


O (h 2 ) central difference approximation for the fourth derivative: 


. /-2-4/_ 1 +6/q-4/ 1 + / 2 


(from difapx(4, [-2 2]) (5.3.6) 
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function [c,err,eoh,A,b] = difapx(N,points) 

%difapx.m to get the difference approximation for the Nth derivative 
1 = max(points); 

L = abs(points(1)-points(2))+ 1; 
if L < N + 1, error('More points are needed! 1 ); end 
for n = 1: L 
A(1,n) = 1; 

for m = 2:L + 2, A(m,n) = A(m - 1,n)*1/(m - 1); end %Eq.(5.3.5) 

1 = 1 - 1 ; 
end 

b = zeros(L,1); b(N + 1) = 1; 

c =(A(1:L,:)\b)'; %coefficients of difference approximation formula 
err = A(L + 1,:)*c'; eoh = L-N; %coefficient & order of error term 
if abs(err) < eps, err = A(L + 2,:)*c'; eoh = L - N + 1; end 
if points(l) < points(2), c = fliplr(c); end 


The procedure of setting up this equation and solving it is cast into the 
MATLAB routine “difapx()”, which can be used to generate the coefficients 
of, say, the approximation formulas (5.1.7), (5.1.9), and (5.3.2) just for prac¬ 
tice/verification/fun, whatever your purpose is. 

»format rat %to make all numbers represented in rational form 
»difapx(l,[0 -2]) %1 st derivative based on {f 0 ,f-i,f_ 2 } 
ans = 3/2 -2 1/2 %Eq.(5.1-7) 

»difapx(l, [-2 2]) %l st derivative based on {f_ 2 , f_!, f 0 , fi, f 2 } 
ans = 1/12 -2/3 0 2/3 -1/12 %Eq.(5.1.9) 

»difapx(2,[2 -2]) %2 nd derivative based on {f 2 ,fi,fo,f-i,f_ 2 > 
ans = -1/12 4/3 -5/2 4/3 -1/12 %Eq.(5.3.2) 


Example 5.1. Numerical/Symbolic Differentiation for Taylor Series Expansion. 
Consider how to use MATLAB to get the Taylor series expansion of a func¬ 
tion—say, e~ x about x = 0—which we already know is 


As a numerical method, we can use the MATLAB routine “difapx()”. On 
the other hand, we can also use the MATLAB command “taylor()’\ which 
is a symbolic approach. Readers may put ‘help taylor’ into the MATLAB 
command window to see its usage, which is restated below. 


• taylor (f) gives the fifth-order Maclaurin series expansion of f. 

• taylor(f,n + 1) with an integer n > 0 gives the nth-order Maclaurin 
series expansion of f. 

• taylor (f, a) with a real number(a) gives the fifth-order Taylor series expan¬ 
sion of f about a. 
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• taylor (f, n + 1, a ) gives the n th-order Taylor series expansion of f about 
default_variable = a. 

• taylor (f, n + 1, a,y) gives the nth-order Taylor series expansion of f(y) 
about y = a. 

(cf) The target function f must be a legitimate expression given directly as the first 
input argument. 

(cf) Before using the command “taylor ()”, one should declare the arguments of the 
function as symbols by putting the statement like “syms x t”. 

(cf) In the case where the function has several arguments, it is a good practice to put the 
independent variable as the last input argument of “taylor()”, though taylor() 
takes one closest (alphabetically) to ‘x’ as the independent variable by default only 
if it has been declared as a symbolic variable and is contained as an input argument 
of the function f. 

(cf) One should use the MATLAB command “sym2poly ()” if he wants to extract the 
coefficients from the Taylor series expansion obtained as a symbolic expression. 


The following MATLAB program “nm5e01” finds us the coefficients of fifth- 
order Taylor series expansion of e~ x about x = 0 by using the two methods. 


%nm5e01:Nth-order Taylor series expansion for e~-x about xo in Ex 5.1 
f=inline('exp(-x)','x'); 

N = 5; xo = 0; 

%Numerical computation method 
T(1) = feval(f.xo); 

h = 0.005 %.01 or 0.001 make it worse 
tmp = 1; 
for i = 1:N 

tmp = tmp*i*h; %i! (factorial i)*h , 'i 

c = difapx(i,[-i i]); %coefficient of numerical derivative 
dix = c*feval(f,xo + [-i:i]*h) 1 ; %/h~i; %derivative 
T(i+1) = dix/tmp; %Taylor series coefficient 

format rat, Tn = fliplr(T) %descending order 

%Symbolic computation method 

syms x; Ts = sym2poly(taylor(exp(-x),N + 1,xo)) 

%discrepancy 

format short, discrepancy=norm(Tn - Ts) 


5.4 INTERPOLATING POLYNOMIAL AND NUMERICAL 
DIFFERENTIAL 

The difference approximation formulas derived in the previous sections are appli¬ 
cable only when the target function f(x ) to differentiate is somehow given. In 
this section, we think about how to get the numerical derivatives when we are 
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given only the data file containing several data points. A possible measure is 
to make the interpolating function by using one of the methods explained in 
Chapter 3 and get the derivative of the interpolating function. 

For simplicity, let’s reconsider the problem of finding the derivative of f(x) = 
si n x at x = n/A, where the function is given as one of the following data 
point sets: 



K 7.it . 7n \ / 37r . 3it \ /47T . 4jt \ / 57T . 57T \ / 6it . (m \ | 
'l6’ Sm l6/ ’ \l6’ Sm lb/ ’ \lb ’ Sm / \lA’ Sln T6/ ’ Vl6’ Sm l6/| 

We make the MATLAB program “nm540”, which uses the routine “lagranpO” 
to find the interpolating polynomial, uses the routine “polyder ()” to differentiate 
the polynomial, and computes the error of the resulting derivative from the true 
value. Let’s run it with x defined appropriately according to the given set of data 
points and see the results. 

»nm540 

dfx( 0.78540) = 0.689072 (error: -0.018035) %with x = [1:3]*pi/8 

dfx( 0.78540) = 0.706556 (error: -0.000550) %with x = [0:4]*pi/8 

dfx( 0.78540) = 0.707072 (error: -0.000035) %with x = [2:6]*pi/16 

This illustrates that if we have more points that are distributed closer to the target 

point, we may get better result. 


%nm540 

% to interpolate by Lagrange polynomial and get the derivative 
clear, elf 
x0 = pi/4; 

dfO = cos(xO); % True value of derivative of sin(x) at xO = pi/4 
for m = 1:3 

if m == 1, x = [1:3]*pi/8; 
elseif m == 2, x = [0:4]*pi/8; 
else x = [2:6]*pi/16; 

y = sin(x); 

px = lagranp(x,y); % Lagrange polynomial interpolating (x,y) 
dpx = polyder(px); % derivative of polynomial px 
dfx = polyval(dpx, xO); 

fprintf( 1 dfx(%6.4f) = %10.6f (error: %10.6f)\n 1 , xO,dfx,dfx - dfO); 


One more thing to mention before closing this section is that we have the 
MATLAB built-in routine “diff ()”, which finds us the difference vector for a 
given vector. When the data points { (x k , /(**)), k = 1,2,...} are given as an 
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ASCII data file named “xy.dat”, we can use the routine “diff()” to get the 
divided difference, which is similar to the derivative of a continuous function. 

»load xy.dat %input the contents of 'xy.dat' as a matrix named xy 
»dydx = diff (xy(: ,2))./diff (xy(:, 1)); dydx' %divided difference 
dydx = 2.0000 0.50000 2.0000 


Xk 

k xy(:,1) 

fiXk) 

xy(:,2) 

Xk+l ~ x k fix k + i ) - fix k ) 

diff(xy(:,1)) diff(xy(:,2)) 

fiXk+l) - fix k ) 

D k = 

Xk+l ~ Xk 

1 -1 

2 

1 2 

2 

2 0 

4 

2 1 

1/2 

3 2 

5 

-1 -2 

2 

4 1 

3 




5.5 NUMERICAL INTEGRATION AND QUADRATURE 


The general form of numerical integration of a function f(x ) over some interval 
[a, b ] is a weighted sum of the function values at a finite number (N + 1) of 
sample points (nodes), referred to as ‘quadrature’: 


f 


f(x) dx = ^2 w k f(x k ) 


with a = xq < x\ < ■ ■ ■ < xn = b (5.5.1) 


Here, the sample points are equally spaced for the midpoint rule, the trapezoidal 
rule, and Simpson’s rule, while they are chosen to be zeros of certain polynomials 
for Gaussian quadrature. 

Figure 5.3 shows the integrations over two segments by the midpoint rule, 
the trapezoidal rule, and Simpson’s rule, which are referred to as Newton-Cotes 
formulas for being based on the approximate polynomial and are implemented 
by the following formulas. 


(midpoint rale) 


(trapezoidal rale) 


(Simpson’s rale) 


J t+ ' f(x) dx = hf mk (5.5.2) 

with h = Xk+l - Xk, fmk = fiXmk), X mk = --- 

p +1 fix) dx = \ifk + fk+ 1 ) (5.5.3) 

with h = x k + 1 - x k , f k = fix k ) 

J M f(x) dx = t if k _ t + 4 f k + fk+i) (5.5.4) 

Xk +1 — Xk-1 
2 


with h = 
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(a) The midpoint rule 


(b) The trapezoidal rule 



(c) Simpson's rule 

Figure 5.3 Various methods of numerical integration. 


These three integration rules are based on approximating the target function 
(integrand) to the zeroth-, first- and second-degree polynomial, respectively. Since 
the first two integrations are obvious, we are going to derive just Simpson’s rule 
(5.5.4). For simplicity, we shift the graph of f(x ) by —Xk along the x axis, 
or, equivalently, make the variable substitution l = x — X/, so that the abscissas 
of the three points on the curve of f(x) change from x = {xk — h, Xk, Xk + h] 
to t = {—h, 0, +h}. Then, in order to find the coefficients of the second-degree 
polynomial 

p 2 (t) = cit 2 + c 2 t + c 3 (5.5.5) 

matching the points (— h, (0, fk), (+h, fk+\), we should solve the follow¬ 

ing set of equations: 


Pii-h) = ci(—h) 2 + c 2 (—h) + c 3 = fk— i 
P2 (0) = CiO 2 + c 2 0 + c 3 = fk 

Pi(+h) = ci(+h) 2 + c 2 (+h ) + c 3 = fk+\ 
to determine the coefficients ci, c 2 , and c 3 as 
fk+ i - fk -1 


C3 = fk, 


Integrating the second-degree polynomial (5.5.5) with these coefficients from 
t = — h to t = h yields 
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Pi(t)dt 


1 \ h 2 
3 + -C2? 2 + c 3 /| = -Cih 3 + 2c 3 h 

fk+1 ^ fk ~ l - A + 3 A) = \(fk- 1 + 4/* + A+i) 


This is the Simpson integration formula (5.5.4). 

Now, as a preliminary work toward diagnosing the errors of the above inte¬ 
gration formulas, we take the Taylor series expansion of the integral function 

«W = r fit ) dt with g\x) = fix), g (2 \x) = fix), gQ\x) = f (2 \x) 

(5.5.6) 

about the lower bound x k of the integration interval to write 

g(x) = g(x k ) + g'(x k )(x - x k ) + ] -g <2) (x k )(x - x k ) 2 + =yg (3) (x*)(* - x k f + ■■■ 

Substituting Eq. (5.5.6) together with x = x k+ \ and x k+ \ — x k = h into this yields 

f xt+l h 2 h? ,,, h 4 m h 5 ,,, 

J f(x) dx = 0 + hf(x k ) + —f ix k ) + —f (2) ix k ) + — f°\x k ) + — f \x k ) 4- 

(5.5.7) 

First, for the error analysis of the midpoint rule, we substitute x k -\ and —h = 
x k -i — x k in place of x k+ \ and h in this equation to write 

J fix) dx= 0 - hfix k ) + y f\x k ) - y / (2) fe) + y f°\x k ) - y f (4 \x k ) - 

and subtract this equation from Eq. (5.5.7) to write 

J fix)dx-J fix)dx = J* fix) dx + J fix)dx 

rxk+i 2h 3 ? h 5 

= j fix) dx = 2 hfix k ) + — f (2 \x k ) + — f w ix k ) + ■■■ (5.5.8) 


Substituting x k and x mk = (x k + x k +i)/2 in place of x k \ and x k in this equation 
and noting that jc*+i — x mk = x mk — x k = h/ 2, we obtain 


fix)dx = hf ix m k) + - 




J fix) dx - hf(x mk ) = ^ f (2) ix mk ) + J^f (4) (Xmk) + ■ 


(5.5.9) 
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This, together with Eq. (5.5.2), implies that the error of integration over one 
segment by the midpoint rule is proportional to ft 3 . 

Second, for the error analysis of the trapezoidal rule, we subtract Eq. (5.5.3) 
from Eq. (5.5.7) to write 

f " +1 f{x)d X - l(f(x k ) + f(x k+l )) 

= hf(x k ) + y/'(**) + y/ (2) (x k ) + } ^f°\xk) + y/ (4) (**) H- 

- ^ (f(x k ) + f(X k ) + hf(x k ) + y/ < 2 , (A-*) + y/ (3) (^) 

= ~f (2 \x k ) - ^f°\x k ) - ^ f m (x k ) + 0(h 6 ) = 0(h 3 ) (5.5.10) 

This implies that the error of integration over one segment by the trapezoidal 
rule is proportional to ft 3 . 

Third, for the error analysis of Simpson’s rule, we subtract the Taylor series 
expansion of Eq. (5.5.4) 

^(/(.v*_i) + 4 f(x k ) + f(x k+ 0) 

= l (/(**) + 4■/(**) + f(x k ) + ^f a \x k ) + ^/< 4 >(**) + • • •) 

= 2hf(x k ) + '^-f (2) (x k ) + ^f (4) (x k ) + ■■■ 

3 3o 

from Eq. (5.5.8) to write 

jf 1+1 fix) dx - ^(fixk-i) + 4 f(x k ) + f(x k+ 1 )) = ~f«\ Xk ) + 0(h 7 ) 

= 0(h 5 ) (5.5.11) 

This implies that the error of integration over two segments by Simpson’s rule 
is proportional to ft 5 . 

Before closing this section, let’s make use of these error equations to find 
a way of estimating the error of the numerical integral from the true integral 
without knowing the derivatives of the target (integrand) function f(x). For 
this purpose, we investigate how the error of numerical integration by Simp¬ 
son’s rule 

I s (x k -i,x k+1 ,h) = ^( f(x k _i ) + 4 f(x k ) + f(x k+ 1 » 
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will change if the segment width h is halved to h/2. Noting that, fromEq. (5.5.11), 
E s (h) = £ ^ f(x)dx - I s (x k -i,x k+1 ,h) * -^/ (4) (c)(c € fe_i,^ +1 ]) 
Es ^ = [ x fW dx ~ Is(xk-i, x k+u^j 

= £ f(x)dx - Is (x k -!,x k , +£ /(*)<** 

- ^5 X k+U ^ (C € , JCjfc+i]) 

we can express the change of the error caused by halving the segment width 


\E s (h)-E s 


(f 


h( x k -\> x k +\> h) — Is ( **+!> X ) 


(5.5.12) 


This suggests the error estimate of numerical integration by Simpson’s rule as 
\es ^ 2 ^) | ^ 2 4 - 1 ’ X/c + l ’ h) ~ Is (xk- 1, **+i, 2 ^ | (5.5.13) 

Also for the trapezoidal rule, similar result can be derived: 


Et x 


1 

* 2 2 — 1 


krfe-t. x k+ i, h) — I T (-*:*+!, x ) 


5.6 TRAPEZOIDAL METHOD AND SIMPSON METHOD 

In order to get the formulas for numerical integration of a function f(x ) over 
some interval [a,&], we divide the interval into N segments of equal length 
h = (b — a)/N so that the nodes (sample points) can be expressed as {x = a + 
kh, k = 0, 1, 2,..., N}. Then we have the numerical integration of f(x) over 
[a, b\ by the trapezoidal rule (5.5.3) as 


rb N-l fXk+i 

J f(x)dx=J2j f(x)dx 


= x{(/o + /l) + (/l + fl) + ■ ■ ■ + (/n -2 + fN-l ) + (/iV-1 + fN )} 
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, , , ,, , [/(<*) + /(*) , ^ „ ) fe£ „ 

I T i(a,b,h) = h l ---f f( x k)\ (5.6.1) 

whose error is proportional to h 2 as IV times the error for one segment [Eq. (5.5.10)], 
that is, 

NO(h 3 ) = (b - a)/hx 0(h 3 ) = 0(h 2 ) 


On the other hand, we have the numerical integration of f(x) over [a, £] by 
Simpson’s rule (5.5.4) with an even number of segments N as 


r b ^/2-l fXh 

J a f(x)dx= J 


f(x)dx 


= g {(/o + 4/i + fi) + (fi + 4/3 + fi) -\ -+ (/jv —2 + 4/jv-i + /v)} 

^ ( N/ 2—1 N/ 2-1 -j 

I s 4 (a, b,h) = 2 J/( fl ) + f(b) + 4 E /(* 2 »+i) + 2 E /fe m )} ( 5 . 6 . 2 ) 

h \ /V/2— 1 v-i \ t 

= 3 J /(«) + f(b) + 2 ^ E q /fem+l) + E /(**) j j 


whose error is proportional to as IV times the error for one segment [Eq. (5.5.11)], 
that is, 

(N/2)0(h 5 ) = (b- a)/2h x 0(h 5 ) = 0(h 4 ) 


These two integration formulas by the trapezoidal rule and Simpson’s rule are 
cast into the MATLAB routines “trpzdsO” and “smpsnsO”, respectively. 


function INTf = trpzds(f,a,b,N) 

%integral of f(x) over [a,b] by trapezoidal rule with N segments 
if abs(b - a) < eps | N <= 0, INTf = 0; return; end 

h = (b - a)/N; x = a +[0:N]*h; fx = feval(f,x); values of f for all nodes 
INTf = h*((fx(1) + fx(N + 1))/2 + sum(fx(2:N))); %Eq.(5.6.1) 


function INTf = smpsns(f,a,b,N,varargin) 

%integral of f(x) over [a,b] by Simpson's rule with N segments 
if nargin < 4, N = 100; end 

if abs(b - a)<1e-12 | N <= 0, INTf = 0; return; end 
if mod(N,2) -= 0, N = N + 1; end %make N even 

h = (b - a)/N; x = a + [0:N]*h; %the boundary nodes for N segments 

fx = fevel(f,x,varargin{:}); %values of f for all nodes 

fx(find(fx == inf)) = realmax; fx(find(fx == -inf)) = -realmax; 

kodd = 2:2:N; keven = 3:2:N - 1; %the set of odd/even indices 

INTf = h/3*(fx(1) + fx(N + 1)+4*sum(fx(kodd)) + 2*sum(fx(keven)));%Eq.(5.6.2) 
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5.7 RECURSIVE RULE AND ROMBERG INTEGRATION 


In this section, we are going to look for a recursive formula which enables us 
to use some numerical integration with the segment width h to produce another 
(hopefully better) numerical integration with half the segment width (h/2). Addi¬ 
tionally, we use Richardson extrapolation (Section 5.1) together with the two 
successive numerical integrations to make a Romberg table that can be used to 
improve the accuracy of the numerical integral step by step. 

Let’s start with halving the segment width h to h/2 for the trapezoidal method. 
Then, the numerical integration formula (5.6.1) can be written in the recursive 
form as 


I T 2 



h f (a) + fib) 
2 | 2 


2N-1 

+ X 


h \ fja) + fib) 
2 [ 2 


N -1 N -1 

+ X f ( X 2*/2) + X f ( X (2m+l)/2) 


- I In(a, b, h ) + X /(*( 2 m+i)/ 2 )(terms for inserted nodes) | 


(5.7.1) 

Noting that the error of this formula is proportional to h 2 ( 0(h 2 )), we apply a 
Richardson extrapolation [Eq. (5.1.10)] to write a higher-level integration formula 
having an error of O ( h 4 ) as 


It4(ci, b, h) = 


2 2 I T2 (a, b, h) - I T2 (a, b, 2ft) 


2 2 — 1 

(5 = 1) \ |4 (/(«) + m + 2 X 

2 u / N/2 - 1 \ 1 

-y ( m + fib) + 2 X /(*2m) j j 

, r N/2 N/ 2-1 j 

= 3 f(a) + fib ) + 4 X /(*2m-i) + 2 X /( x 2 m ) 


which coincides with the Simpson’s integration formula. This implies that we 
don’t have to distinguish the trapezoidal rule from Simpson’s rule. Anyway, 
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replacing h by h/2 in this equation yields 

, ( u h\ 2 2 I T1 (a,b,h/2)-I T2 (a,b,h ) 

2 ) = -2TTT- 

which can be generalized to the following formula: 

J T 2(J)+I)(a> bt 2 -( k+ i) h) = 2 2n I T ,2n(a, b, 2~ (k *^h) — I Ti 2n(a, b, 2~ k h) 

for n > 1, fc > 0 (5.7.3) 

Now, it is time to introduce a systematic way, called Romberg integration, of 
improving the accuracy of the integral step by step and estimating the (trun¬ 
cation) error at each step to determine when to stop. It is implemented by 
a Romberg Table (Table 5.4), that is, a lower-triangular matrix that we con¬ 
struct one row per iteration by applying Eq. (5.7.1) in halving the segment width 
h to get the next-row element (downward in the first column), and applying 
Eq. (5.7.3) in upgrading the order of error to get the next-column elements 
(rightward in the row) based on the up-left (north-west) one and the left 
(west) one. At each iteration k, we use Eq. (5.5.14) to estimate the truncation 
error as 

\E T , 2 (k + i)(2- k h)\ « |/r, 2 *(2-*/0 - I T ,2kQr (k - l) h)\ (5.7.4) 

and stop the iteration when the estimated error becomes less than some prescribed 
tolerance. Then, the last diagonal element is taken to be ‘supposedly’ the best 


function [x,R,err,N] = rmbrg(f,a,b,tol,K) 

%construct Romberg table to find definite integral of f over [a,b] 

h = b - a; N = 1; 

if nargin < 5, K = 10; end 

R( 1,1) = h/2*(feval(f,a)+ feval(f.b)); 

for k = 2:K 

h = h/2; N = N*2; 

R(k,1) = R(k - 1,1)/2 + h*sum(feval(f,a +[1:2:N - 1 ]*h)); %Eq.(5.7.1) 
tmp = 1; 
for n = 2:k 
tmp = tmp*4; 

R(k,n) = (tmp*R(k,n - 1)-R(k - 1,n - 1))/(tmp - 1); %Eq.(5.7.3) 
end 

err = abs(R(k,k - 1)- R(k - 1,k - 1))/(tmp - 1); %Eq.(5.7.4) 
if err < tol, break; end 

x = R(k,k); 
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estimate of the integral. This sequential procedure of Romberg integration is cast 
into the MATLAB routine “rmbrg()”. 

Before closing this section, we test and compare the trapezoidal method 
(“trpzds ()”), Simpson method (“smpsnsO”), and Romberg integration 
(“rmbrg()”) by trying them on the following integral 


Table 5.4 Romberg Table 


Iteration k 

Segment Width h 

re- 1 re-2 re-3 . 

0 

h 0 

h.iiho) 



(5.7.1) | (5,7.3)\ 

1 

2~ l h 0 

/r,2(2- 1 /io)^/r,4(2- 1 /io) 



(5.7.14 (5.7.3)\ (5.7.3K 

2 

2 2 /t 0 

lT,z(l- 2 ho)-^lTA{2- 2 ho)-^I T , 6 U~ 2 ho) 


jf 400x(l -x)e~ 2x dx = 100 ^-2e~ 2x x(l 2e~ 2x (l - 2x) dx 

= m\-2e- 2 *x{\-x)\ A -e- 2x (\-2x)[ -2 [ e~ 2x dx 

{ lo lo Jo 

= 200x 2 e~ 2x | = 3200e~ 8 = 1.07348040929 (5.7.5) 

Here are the MATLAB statements for this job listed together with the run¬ 
ning results. 

»f = inline('400*x.*(1 - x) .*exp(-2*x)', 'x'); 

»a = 0; b = 4; N = 80; 

»format short e 
»true_l = 3200*exp(-8) 

»lt = trpzds(f ,a,b,N), errt = lt-true_l %trapezoidal 
It = 9.9071e-001, errt = -8.2775e-002 

»Is = smpsns(f,a,b,N), errs = Is-true_I %Simpson 
iNTfs = 1.0731e+000, error = -3.3223e-004 
»[IR,R,err,N1 ] = rmbrg(f ,a,b, .0005), errR = IR - true_I %Romberg 
INTfr = 1.0734e+000, N1 = 32 
error = -3.4943e-005 

As expected from the fact that the errors of numerical integration by the trape¬ 
zoidal method and Simpson method are 0(h 2 ) and 0(h 4 ), respectively, the 
Simpson method presents better results (with smaller error) than the trapezoidal 
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one with the same number of segments N = 80. Moreover, Romberg integration 
with N = 32 shows a better result than both of them. 


5.8 ADAPTIVE QUADRATURE 

The numerical integration methods in the previous sections divide the inte¬ 
gration interval uniformly into the segments of equal width, making the error 
nonuniform over the interval—that is, small/large for smooth/swaying portion 
of the curve of integrand /(*). In contrast, the strategy of the adaptive quadra¬ 
ture is to divide the integration interval nonuniformly into segments of (gener¬ 
ally) unequal lengths—that is, short/long segments for swaying/smooth portion 
of the curve of integrand f(x), aiming at having smaller error with fewer 
segments. 

The algorithm of adaptive quadrature scheme starts with a numerical integral 
(INTf ) for the whole interval and the sum of numerical integrals (INTf12 = 
INTf 1 + INTf 2) for the two segments of equal width. Based on the difference 
between the two successive estimates INTf and INTf 12, it estimates the error of 
INTf 12 by using Eq. (5.5.13)/(5.5.14) depending on the basic integration rule. 
Then, if the error estimate is within a given tolerance (tol), it terminates with 
INTf 12. Otherwise, it digs into each segment by repeating the same procedure 
with half of the tolerance (tol/2) assigned to both segments, until the deepest 
level satisfies the error condition. This is how the adaptive scheme forms sections 
of nonuniform width, as illustrated in Fig. 5.4. In fact, this algorithm really fits 
the nested (recursive) calling structure introduced in Section 1.3 and is cast into 



Figure 5.4 The subintervals (segments) and their boundary points (nodes) determined by the 
adaptive Simpson method. 
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the routine “adap_smpsn()’\ which needs the calling routine “adapt_smpsn()” 
for start-up. 


function [INTf,nodes,err] = adap_smpsn(f,a,b,INTf,tol,varargin) 

%adaptive recursive Simpson method 
c = (a+b)/2; 

INTfl = smpsns(f,a,c,1,varargin{:}); 

INTf2 = smpsns(f,c,b,1,varargin{:}); 

INTf12 = INTfl + INTf2; 

err = abs(INTf12 - INTf)/15; % Error estimate by Eq.(5.5.13) 
if isnan(err) | err < tol | tol<eps % NaN? Satisfying error? Too deep level? 
INTf = INTf12; 
points = [a c b]; 

[INTfl,nodes1,errl] = adap_smpsn(f,a,c,INTfl,tol/2,varargin{:}); 

[INTf2,nodes2,err2] = adap_smpsn(f,c,b,INTf2,tol/2,varargin{:}); 

INTf = INTfl + INTf2; 

nodes = [nodesl nodes2(2:length(nodes2))]; 
err = errl + err2; 


function [INTf,nodes,err] = adapt_smpsn(f,a,b,tol,varargin) 
%apply adaptive recursive Simpson method 
INTf = smpsns(f,a,b,1,varargin{:}); 

[INTf,nodes,err] = adap_smpsn(f,a,b,INTf,tol,varargin{:}); 


We can apply these routines to get the approximate value of integration 
(5.7.5) by putting the following MATLAB statements into the MATLAB com¬ 
mand window. 

»f = inline('400*x.*(1 - x) .*exp(-2*x)', 'x'); 

»a=0; b = 4; tol = 0.001; 

»format short e 
»true_I = 3200*exp(-8); 

»las = adapt_smpsn(f ,a,b,tol), erras=las-true_l 
las = l.0735e+000, erras = -8.9983e-006 

Figure 5.4 shows the curve of the integrand f{x) = 40(k(I — x)e~ 2x together 
with the 25 nodes determined by the routine “adapt_smpsn()”, which yields 
better results (having smaller error) with fewer segments than other methods 
discussed so far. From this figure, we see that the nodes are dense/sparse in the 
swaying/smooth portion of the curve of the integrand. 

Flere, we introduce the MATLAB built-in routines adopting the adaptive recur¬ 
sive integration scheme together with the illustrative example of their usage. 

"quad(f,a,b,tol,trace,pi,p2,..)" / "quadl(f,a,b,tol,trace,pi,p2,..)" 

»Iq = quad(f,a,b,tol), errq = Iq - true_I 
Iq = 1.0735e+000, errq = 4.0107e-005 
»Iql = quadl(f ,a,b,tol), errql = Iql - true_I 
Iql = 1.0735e+000, errql = -1.2168e-008 
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(cf) These routines are capable of passing the parameters (pi ,p 2 ,..) to the integrand 
(target) function and can be asked to show a list of intermediate subintervals with 
the fifth input argument trace=1. 

(cf) quadl () is introduced in MATLAB 6 .x version to replace another adaptive integration 
routine quad8() which is available in MATLAB 5.x version. 

Additionally, note that MATLAB has a symbolic integration routine 
“int (f, a, b)”. Readers may type “help int” into the MATLAB command win¬ 
dow to see its usage, which is restated below. 


• int(f) gives the indefinite integral of f with respect to its independent 
variable (closest to ‘x’). 

• int(f ,v) gives the indefinite integral of f (v) with respect to v given as 
the second input argument. 

• int(f ,a,b) gives the definite integral of f over [a,b] with respect to its 
independent variable. 

• int(f,v,a,b) gives the definite integral of f(v) with respect to v over 
[a,b]. 

(cf) The target function f must be a fegitimate expression given directly as the first 
input argument and the upper/lower bound a, b of the integration interval can be 
a symbolic scalar or a numeric. 


Example 5.2. Numerical/Symbolic Integration using quad() /quadl()/int (). 

Consider how to make use of MATLAB for obtaining the continuous-time 
Fourier series (CtFS) coefficient 

[P/ 2 [P/2 

X k = / x(t)e~ ]k(O0t dt = / x{t)e- j2nkt/p dt (E5.2.1) 

J-P/2 J-P/2 


For simplicity, let’s try to get just the 16th CtFS coefficient of a rectangular 
wave 


x(t) = 


1 for - 1 < t < 1 
0 for — 2<f<lorl<f<2 


(E5.2.2) 


which is periodic in t with period P = 4. We can compute it analytically as 


Xi 6 = [ 2 x(t)e~ j27tl6t/4 dt = I' e- jint dt = -4— e 

J -2 J- i -;87r 


= — sin(87Tt) 


(E5.2.3) 
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%nm5e02 

%use quad()/quad8() and int() to get CtFS coefficient X16 in Ex 5.2 
ftn = 'exp(-j*k*wO*t) 1 ; fcos = inline(ftn,'tk'wO'); 

P = 4; k = 16; wO = 2*pi/P; 
a = -1; b = 1; tol = 0.001; trace = 0; 

Xl6_quad = quad(fcos,a,b,tol,trace,k,wO) 

Xl6_quadl = quadl(fcos,a,b,tol,trace,k,wO) 
syms t; % declare symbolic variable 

Iexp = int(exp(-j*k*wO*t),t) % symbolic indefinite integral 
Icos = int(cos(k*wO*t),t) % symbolic indefinite integral 
X16_sym = int(cos(k*wO*t),t,-1,1) % symbolic definite integral 


As a numerical approach, we can use the MATLAB routine “quad()”/ 
“quadl()”. On the other hand, we can also use the MATLAB routine “into”, 
which is a symbolic approach. We put all the statements together to make the 
MATLAB program “nm5e02”, in which the fifth input argument (trace) of 
“quad()”/“quadl()” is set to 1 so that we can see their nodes and tell how 
different they are. Let’s run it and see the results. 

»nm5e02 

Xl6_quad = 0.8150 + O.OOOOi %betrayal of MATLAB? 

Xl6_quadl = 7.4771e-008 %almost zero, OK! 

Iexp = 1/8*i/pi*exp(-8*i*pi*t) %(E5.2.3) by symbolic computation 
Icos = 1/8/pi*sin(8*pi*t) %(E5.2.3) by symbolic computation 
X16_sym = 0 %exact answer by symbolic computation 

What a surprise! It is totally unexpected that the MATLAB routine “quad()” 
gives us a quite eccentric value (0.8150), even without any warning message. The 
routine “quad()” must be branded as a betrayer for a piecewise-linear function 
multiplied by a periodic function. This seems to imply that “quadl()” is better 
than “quad()” and that “into” is the best of the three commands. It should, 
however, be noted that “int () ” can directly accept and handle only the functions 
composed of basic mathematical functions, rejecting the functions defined in the 
form of string or by the “inline()” command or through an m-file and besides, 
it takes a long time to execute. 

(cf) What about our lovely routine “adapt_smpsn()”? Regrettably, you had better not 
count on it, since it will give the wrong answer for this problem. Actually, “quadl()” 
is much more reliable than “quad()” and “adapt_smpsn()”. 


5.9 GAUSS QUADRATURE 

In this section, we cover several kinds of Gauss quadrature methods—that is, 
Gauss-Legendre integration, Gauss-Hermite integration, Gauss-Laguerre inte¬ 
gration and Gauss-Chebyshev I,II integration. Each tries to approximate one of 
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the following integrations, respectively: 

^ f(t)dt, j + e- , 2 f(t)dt, jT +0 ° e-‘f{t)dt. 

The problem is how to fix the weight wfs and the (Gauss) grid points t, ’s. 

5.9.1 Gauss-Legendre Integration 

If the integrand fit ) is a polynomial of degree < 3(= 2N — 1), then its inte¬ 
gration 

/(—1,1) = J ^ f(t)dt (5.9.1) 

can exactly be obtained from just 2 (N) points by using the following formula 

I[h, t 2 \ = mf(h) + W 2 f(t 2 ) (5.9.2) 

How marvelous it is! It is almost a magic. Do you doubt it? Then, let’s find the 
weights tui, w 2 and the grid points t\, t 2 such that the approximating formula 
(5.9.2) equals the integration (5.9.1) for fit) = l(of degree 0), t(of degree 1), 
r 2 (of degree 2), and t 3 (of degree 3). In order to do so, we should solve the 
following system of equations: 


fit) = 1 : 

wifih) + w 2 fit 2 ) = wi + w 2 = 

L ' d,=2 

(5.9.3a) 

m = * ■ 

U>1 fih) + W 2 fit 2 ) = lV\t\ + w 2 t 2 

= L ,d,=o 

(5.9.3b) 

fit) = t 2 : 

Wifih) + w 2 fit 2 ) = w x tl + w 2 t 

i*Cf d ' = \ 

(5.9.3c) 

fit) = t 3 : 

mfih) + w 2 fit 2 ) = wxt\ + w 2 t 

■1 = J t 3 dt = 0 

(5.9.3d) 


Multiplying (5.9.3b) by t\ and subtracting the result from (5.9.3d) yields 
ujiitl — tlt 2 ) = w 2 t 2 (t 2 +1\ )(t 2 — ti) = 0 -> t 2 = — t\, t 2 = t] (meaningless) 
t 2 = -h (5.9.3b), (tui - w 2 )h = 0, 
ici = w 2 —> (5.9.3a), u>i + wi = 2 

2 1 

uq = w 2 = 1 ->• (5.9.3c), q 2 + (-q) 2 =-, h = -t 2 = ~— 
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so that Eq. (5.9.2) becomes 


;[ ,„, 2] = / (__L) +/ (_L) (5.9.4, 

We can expect this approximating formula to give us the exact value of the 
integral (5.9.1) when the integrand f(t ) is a polynomial of degree < 3. 

Now, you are concerned abouthow to generalize this two-point Gauss-Legendre 
integration formula to an IV-point case, since a system of nonlinear equation like 
Eq. (5.9.3) can be very difficult to solve as the dimension increases. But, don’t 
worry about it. The N grid points (/,’s) of Gauss-Legendre integration formula 

N 

Igi.Uu = ) (5.9.5) 


giving us the exact integral of an integrand polynomial of degree < (2 N — 1) 
can be obtained as the zeros of the iVth-degree Legendre polynomial [K-l, 
Section 4.3] 


L N (t) = l ^2(-iy ( 2N ~ 2r >- 

f-f V 2 N i\(N -i 


(5.9.6a) 


i\(N — i)!(N — 2i')! 

L N (t) = ^ ((2 N - l)f Lat-1 (0 ~(N~ 1)Ev- 2 (0) (5.9.6b) 


Given the N grid point f, ’s, we can get the corresponding weight w^/s of the 
N-point Gauss-Legendre integration formula by solving the system of linear 
equations 


' 1 

h 

1 

h 

1.1" 

t n • t N 

r ivn,i " 

Wn, 2 
WN,n 

= 

(1- 

2 

0 

(-!)")/« 


t f- 1 

tn~ l * 

- WN,N - 


_ (i — < 

:-i) w )/n_ 


where the nth element of the right-hand side (RHS) vector is 

/ i 1 I 1 1 — f—11" 

t n ~ l dt =-t n \ = --—- (5.9.8) 

-l n |_j n 


This procedure of finding the N grid point t, ’s and the weight u/v.i’s of the 
N-point Gauss-Legendre integration formula is cast into the MATLAB routine 
“Gausslpf)”. We can get the two grid point f, ’s and the weight w^/s of the two- 
point Gauss-Legendre integration formula by just putting the following statement 
into the MATLAB command window. 
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function [t,w] = Gausslp(N) 

if N < 0, fprintf('\nGauss-Legendre polynomial of negative order??\n'); 

t = roots(Lgndrp(N))'; %make it a row vector 
A( 1 j :) = ones(1,N); b(1) = 2; 

for n = 2:N % Eq.(5.9.7) 

A(n, :) = A(n - 1,:).*t; 
if mod(n,2) == 0, b(n) = 0; 
else b(n) = 2/n; % Eq.(5.9.8) 


w = b/A 1 ; 


function p = Lgndrp(N) %Legendre polynomial 

if N <= 0, p = 1; %n*Ln(t) = (2n - 1)t Ln - 1(t)-(n - 1)Ln-2(t) Eq.(5.9.6b) 
elseif N == 1, p = [1 0]; 

else p = ((2*N - 1)*[Lgndrp(N - 1) 0]-(N - 1)*[0 0 Lgndrp(N - 2)])/N; 
end 


function I = Gauss_Legendre(f,a,b,N,varargin) 

%Gauss_Legendre integration of f over [a,b] with N grid points 
% Never try N larger than 25 
[t j w] = Gausslp(N); 

x = ((b - a)*t + a + b)/2; %Eq.(5.9.9) 
fx = feval(f,x,varargin{:}); 

I = w*fx'*(b - a) 12; %Eq.(5.9.10) _ 


»[t,w] = Gausslp(2) 

t = 0.5774 -0.5774 w = 1 1 


Even though we are happy with the /V-point Gauss-Legendre integration 
formula (5.9.1) giving the exact integral of polynomials of degree < (2/V — 1), 
we do not feel comfortable with the fixed integration interval [— 1,+1]. But, 
we can be relieved from the stress because any arbitrary finite interval [a,&] 
can be transformed into [—1,+1] by the variable substitution known as the 
Gauss-Legendre translation 


(b - a)t + a + b 


(5.9.9) 


Then, we can write the N -point Gauss-Legendre integration formula for the 
integration interval [a, b] as 


I[a, b] = J f(x ) dx = ——— J ^ f(x(t)) dt 

b — a -r—* ( b — a)ti + 

I[x l ,x 2 ,...,x N ] = —— 2^ w Nyi f(Xi) with x t = - — 


(5.9.10) 

The scheme of integrating f(x) over the interval [a, b\ by the /V-point Gauss- 
Legendre formula is cast into the MATLAB routine “Gauss_Legendre( )”. We 
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can get the integral (5.7.5) by simply putting the following statements into the 
MATLAB command window. The result shows that the 10-point Gauss-Legendre 
formula yields better accuracy (smaller error), even with fewer nodes/segments 
than other methods discussed so far. 

»f = inline('400*x.*(1 - x).*exp(-2*x)','x'); %Eq.(5.7.5) 

»format short e 
»true_I = 3200*exp( -8); 

»a =0; b=4; N=10; %integration interval & number of nodes(grid points) 
»IGL = gauss_legendre(f ,a,b,N), errGL = IGL-true_I 
IGL = 1.0735e+000, errGL = 1.6289e-009 

5.9.2 Gauss-Hermite Integration 

The Gauss-Hermite integration formula is expressed by Eq. (5.9.5) as 

N 

IghUi, t 2 ,...,t N ] = J2 (5-9.11) 

and is supposed to give us the exact integral of the exponential e ~‘ 2 multiplied 
by a polynomial /(f) of degree < (2A — 1) over (—oo, +oo) 

I = J + °° e- ,2 f(t)dt (5.9.12) 

The A grid point f, ’s can be obtained as the zeros of the A-point Hermite 

polynomial [K-l, Section 4.8] 

VN/2\ • 

H N (t) = —7y—N(N — 1) • • • (A — 2i + 1)(2 t) N ~ 2 ' (5.9.13a) 

i=o 1 ' 

H N (t) = 2tH N -i(t) — H'(t) (5.9.13b) 


function [t,w] = Gausshp(N) 
if N < 0 

error('Gauss-Hermite polynomial of negative degree??'); 

t = roots(Hermitp(N)) 1 ; 

A( 1,:) = ones(1,N); b(1) = sqrt(pi); 

for n = 2:N 

A(n,:) = A(n - 1,:).*t; %Eq.(5.9.7) 

if mod(n,2) == 1, b(n) = (n - 2)/2*b(n - 2); %Eq.(5.9.14) 
else b(n) = 0; 


w | b/A' ; _ 

function p = Hermitp(N) 

%Hn + 1(x) = 2xHn(x)-Hn'(x) from 'Advanced Engineering Math 1 by Kreyszig 
if N <= 0, p = 1; 
else p = [20]; 

for n = 2:N, p = 2*[p 0]-[0 0 polyder(p)]; end %Eq.(5.9.13b) 
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Given the N grid point we can get the weight S of the 77-point 
Gauss-Hermite integration formula by solving the system of linear equations 
like Eq. (5.9.7), but with the right-hand side (RHS) vector as 


RHS(l) = J e -‘ 2 dt = y/ J e-* 2 dx J dy 

= iLL‘ r * +, 1 >dxiy= IJl^ 2n ~ ir 

= ^J—ne~ r2 1 = yfn (5.9.14a) 

RHS(n) = J e _ 'V _1 dt = J (-2 t)e~‘ 2 ~^t n ^ 2 dt (= 0 if n is even) 

= -^ _,2? " _2 |~ oo + ~ 2) J e~ ,2 t n ~ 3 dt = i(n - 2)RHS(n - 2) 

(5.9.14b) 

The procedure for finding the N grid point fi’s and the corresponding weight 
wn/s of the 77-point Gauss-Hermite integration formula is cast into the MAT- 
LAB routine “Gausshpf)”. Note that, even though the integrand function (g(t)) 
doesn’t have e~‘ as a multiplying factor, we can multiply it by e~ r e‘ = 1 to 
fabricate it as if it were like in Eq. (5.9.12): 


l = j°° g(t ) dt = j°° e- ,2 {e t2 g(t )) dt = J°° e~' 2 f{t) dt (5.9.15) 


5.9.3 Gauss-Laguerre Integration 

The Gauss-Laguerre integration formula is also expressed by Eq. (5.9.5) as 

N 

iGLalh, *2, • • • , t N ] = VJN,if(ti) (5.9.16) 


and is supposed to give us the exact integral of the exponential e 1 multiplied 
by a polynomial fit) of degree < (277 — 1) over [0, oo) 

1= r e-'fit) dt (5.9.17) 

Jo 

The 77 grid point tf s can be obtained as the zeros of the 77th-degree Laguerre 
polynomial [K-l, Section 4.7] 


i =0 


My n ‘ 

i\ (N — /)! /! 


(5.9.18) 
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Given the N grid point f,’s, we can get the corresponding weight w^/s of the 
Af-point Gauss-Laguerre integration formula by solving the system of linear 
equations like Eq. (5.9.7), but with the right-hand side (RHS) vector as 

RHS(l) = f e~ t dt = -e~ t |°° = 1 (5.9.19a) 

Jo lo 

RHS(n) = [ dt = -e^t n ~ l |°° + (n - 1) [ dt 

Jo lo Jo 

= (n- l)RHS(n - 1) (5.9.19b) 

5.9.4 Gauss-Chebyshev Integration 

The Gauss-Chebyshev I integration formula is also expressed by Eq. (5.9.5) as 
Iccdh, t N ) = J2 w N,if(ti) (5.9.20) 


and is supposed to give us the exact integral of 1/Vl — t 2 multiplied by a 
polynomial f(t ) of degree < (2 N — 1) over [—1, +1] 

/ = f + -=L=mdt (5.9.21) 

7-i s/l - t 2 

The N grid point f, ’s are the zeros of the ATh-degree Chebyshev polynomial 
(Section 3.3) 

U = cos < ' 2i fori = 1,2,..., AT (5.9.22) 

and the corresponding weight wn/s are uniformly selected as 

w Nyi = n/N, V i = 1. N (5.9.23) 

The Gauss-Chebyshev II integration formula is also expressed by Eq. (5.9.5) as 

Iccilh, t 2 ,...,t N ] = J2 (5.9.24) 


and is supposed to give us the exact integral of s/l — t 2 multiplied by a polyno¬ 
mial f(t) of degree < (2 N - 1) over [-1, +1] 
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The N grid point /,’s and the corresponding weight wnj’s are 



WN,i 


77 - n 2 ( ix \ 

N+ 1 V^ + l/ 


for i = 1, 2,..., N 
(5.9.26) 


5.10 DOUBLE INTEGRAL 

In this section, we consider the numerical integration of a function f(x,y) with 
respect to two variables x and y over the integration region R = {(x, y)\a < x < 
b, c(x) < y < d(x)} as depicted in Fig. 5.5. 

/= fL nx ’ y)dxdy= ri i: > f(x,y)dy^dx (5.10.1) 

The numerical formula for this double integration over a two-dimensional region 
takes the form 


M N 

I (a, b , c{x), d{x)) = ^ w m ^ v n f(x m , y m ,„) (5.10.2) 


where the weights w m , v„ depend on the method of one-dimensional integration 
we choose. 



Figure 5.5 A region for a double integral. 
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(cf) The MATLAB built-in routine dblquad() can accept the boundaries of integration 
region only given as numbers. Therefore, if we want to use the routine in comput¬ 
ing a double integral for a nonrectangular region D, we should define the integrand 
function f(x,y ) for a rectangular region R 2 D (containing the actual integration 
region D) in such a way that f(x, y) = 0 for (x, y) ^ D; that is, the value of the 
function becomes zero outside the integration region D, which may result in more 
computations. 


function INTfxy = int2s(f,a,b,c,d,M,N) 

%double integral of f(x,y) over R = {(x,y)|a <= x <= b, c(x) <= y <= d(x)} 
% using Simpson's rule 

if ceil(M) ~= floor(M) %fixed width of segments on x 
hx = M; M = ceil((b - a)/hx); 


if mod(M,2) ~= 0, M = M + 1 ; end 
hx = (b - a)/M; m = 1:M+1; x = a + (m - 1)*hx; 
if isnumeric(c), cx(m) = c; %if c is given as a constant number 
else cx(m) = feval(c,x(m)); %in case c is given as a function of x 

if isnumeric(d), dx(m) = d; %if c is given as a constant number 
else dx(m) = feval(d,x(m)); %in case d is given as a function of x 

if ceil(N) -= floor(N) %fixed width of segments on y 
hy = N; Nx(m) = ceil((dx(m)- cx(m))/hy); 
ind = find(mod(Nx(m),2) -= 0); Nx(ind) = Nx(ind) + 1; 
else %fixed number of subintervals 
if mod(N,2) ~= 0, N = N +1 ; end 
Nx(m) = N; 

for m = 1:M + 1 

sx(m) = smpsns_fxy(f,x(m),cx(m),dx(m),Nx(m)); 
kodd = 2:2:M; keven = 3:2:M - 1; %the set of odd/even indices 
INTfxy = hx/3*(sx(1) + sx(M + 1) + 4*sum(sx(kodd)) + 2*sum(sx(keven))); 


function INTf = smpsns_fxy(f, x, c, d, N) 

%1-dimensional integration of f(x,y) for Ry = {c <= y <= d} 
if nargin < 5, N = 100; end 

if abs(d - c)< eps | N <= 0, INTf = 0; return; end 

if mod(N,2) -= 0, N = N + 1; end 

h = (d - c)/N; y = c+[0:N]*h; fxy = feval(f,x,y); 

fxy(find(fxy == inf)) = realmax; fxy(find(fxy == -inf)) = -realmax; 

kodd = 2:2:N; keven = 3:2:N - 1; %the set of odd/even indices 

INTf = h/3*(fxy(1) + fxy(N + 1) + 4*sum(fxy(kodd)) + 2*sum(fxy(keven))); 

%nm510: the volume of a sphere 

x = [-1:0.05:1]; y = [0:0.05:1]; [X,Y] = meshgrid(x,y); 
f510 = inline( 1 sqrt(max(1 - x.*x - y.*y,0))','x','y'); 

Z = f510(X,Y); mesh(x,y,Z); 

a = -1; b = 1; c = 0; d=inline('sqrt(max(1 - x.*x,0))','x'); 

Vsl = int2s(f510,a,b,c,d,100,100) %with fixed number of segments 
errorl = Vsl - pi/3 

Vs2 = int2s(f510,a,b,c,d,0.01,0.01) %with fixed segment width 
error2 = Vs2 - pi/3 
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Although the integration rules along the x axis and along the y axis do not need 
to be the same, we make a double integration routine “int2s(f,a,b,c,d,M J N)” 
which uses the Simpson method in common for both integrations and calls another 
routine “smpsns_fxy ()” for one-dimensional integration along the y axis. The 
left/right boundary a/b of integration region given as the second/third input argu¬ 
ment must be a number, while the lower/upper boundary c/d of integration region 
given as the fourth/fifth input argument may be either a number or a function 
of x. If the sixth/seventh input argument M/N is given as a positive integer, it 
will be accepted as the number of segments; otherwise, it will be interpreted as 
the segment width h x /h y . We also constructed a MATLAB program “nm510” 
in order to use the routine “int2s( )” for finding one-fourth of the volume of a 
sphere with the radius r = 1 depicted in Fig. 5.6. 

I = J V^l - x 2 -y 2 dydx = j = 1.04719755 ... (5.10.3) 

Interested readers are recommended to work with these routines and run the 
program “nmSIO.m” to see the result. 

»nm5l0 

Vsl = 1.0470, errorl = -1.5315e-004 
Vs2 = 1.0470, error2 = -1.9685e-004 
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PROBLEMS 

5.1 Numerical Differentiation of Basic Functions 

If we want to find the derivative of a polynomial/trigonometric/exponential 
function, it would be more convenient and accurate to use an analytical 
computation (by hand) than to use a numerical computation (by computer). 
But, in order to test the accuracy of the numerical derivative formulas, 
consider the three basic functions as 

Mx) = x 3 -2x, f 2 (x) = sinx, />(*)=«* (P5.1.1) 

(a) To find the first derivatives of these functions by using the formulas 
(5.1.8) and (5.1.9) listed in Table 5.3 (Section 5.3), modify the program 
“nm5p01 .m”, which uses the MATLAB routine “difapx()” (Section 
5.3) for generating the coefficients of the numerical derivative formulas. 
Fill in the following table with the error results obtained from running 
the program. 


First Derivatives 

h 

/i - f-i 

~h + 8/l - 8/_! + f—2 

2 h 

I2h 

(x 3 - 2xy\ x =i 

= 1.00000000 

0.1 

1.0000e-02 


0.01 


9.1038e-15 

(sina:) , U= 7 r/3 

= 0.50000000 

0.1 

8.3292e-04 


0.01 

8.3333e-06 


(e x Y\ x = o 

= 1.00000000 

0.1 


3.3373e-06 

0.01 

1.6667e-05 



%nm5p0l 

f = inline(’x.*(x.*x-2)', 'x 1 ); 

n = [1 -1]; xO = 1; h = 0.1; DT = 1; 

c = difapx(1,n); i = 1:length(c); 

num = c*feval(f,xO + (n(1) + 1 - i)*h)'; drv = num/h; 

fprintf('with h = %6.4f, %12.6f %12.4e\n', h,drv,drv - DT); 


(b) Likewise in (a), modify the program “nm5p01 .m” in such a way that 
the formulas (5.3.1) and (5.3.2) in Table 5.3 are generated and used to 
find the second numerical derivatives. Fill in the following table with 
the error results obtained from running the program. 
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Second Derivatives 

h 

/ 1 - 2/0 + /-1 

-ft + 16A - 30/o + 16/ , - /_2 

h 2 

12 h 2 

(x 3 - 2x) (2) \ x=l 

= 6.0000000000 

0.1 

2.6654e-14 


0.01 


2.9470e-12 

(sinx)®L =7r/3 
= -0.8660254037 

0.1 


9.6139e-07 

0.01 

7.2169e-06 


(e*) (2) l,=o 

= 1.0000000000 

0.1 

8.3361e-04 


0.01 


1.1183e-10 


5.2 Numerical Differentiation of a Function Given as a Set of Data Pairs 

Consider the three (numerical) functions each given as a set of five data 
pairs in Table P5.2. 


Table P5.2 Three Functions Each Given as a Set of Five Data Pairs 


X 

/iW 

* 

h(x) 

X 

h(x) 

0.8000 

-1.0880 

0.8472 

0.7494 

-0.2000 

1.2214 

0.9000 

-1.0710 

0.9472 

0.8118 

-0.1000 

1.1052 

1.0000 

-1.0000 

1.0472 

0.8660 

0 

1.0000 

1.1000 

-0.8690 

1.1472 

0.9116 

0.1000 

0.9048 

1.2000 

-0.6720 

1.2472 

0.9481 

0.2000 

0.8187 


(a) Use the formulas (5.1.8) and (5.1.9) to find the first derivatives of the 
three numerical functions (at x = 1, 1.0472 and 0, respectively) and fill 
in the following table with the results. Also use the formulas (5.3.1) 
and (5.3.2) to find the second derivatives of the three functions (at 
x = 1, 1.0472 and 0, respectively) and fill in the following table with 
the results. 




/iWh=L0472 

/3«l*=0 

First derivative by Eq. (5.1.8) 

1.0000e-02 


2.0000e-03 

First derivative by Eq. (5.1.9) 


2.5000e-04 



/, (2) «l*=i 

/ 2 <2> (JC)U=1.0472 

f?\x)\*= 0 

Second derivative by Eq. (5.3.1) 


6.0254e-03 


Second derivative by Eq. (5.3.2) 

2.4869e-14 


8.3333e-04 
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(b) Based on the Lagrange/Newton polynomial matching the three/five 
points around the target point, find the first/second derivatives of the 
three functions (at x = 1, 1.0472 and 0, respectively) and fill in the 
following table with the results. 



/[(-V)Ia 1 

f 2 (x) L=1.0472 

/3(*)U=0 

First derivative on l 2 (x) 


1.0000e-03 


First derivative on / 4 (jc) 

4.3201e-12 


4.1667e-04 


/, (2) «Ui 

f 2 2 \x) A -=1.0472 

/ 3 ®(*>l*=o 

Second derivative on l 2 (x) 

1.421 le-14 


0.0000e+00 

Second derivative on / 4 (;t) 


6.8587e-03 



5.3 First Derivative and Step-size 

Consider the routine “ j acob () ” in Section 4.6, which is used for computing 
the Jacobian—that is, the first derivative of a vector function with respect 
to a vector variable. 

(a) Which one is used for computing the Jacobian in the routine “ j acob () ” 
among the first derivative formulas in Section 5.1? 

(b) Expecting that smaller step-size h would yield a better solution to the 
problem given in Example 4.3, Bush changed h = 1e-4toh = 1e-5in 
the routine “newtons () ” and then typed the following statement into the 
MATLAB command window. What solution could he get? 

»rn1 = newtons)'phys',1e6,1e-4,100) 

(c) What baffled him out of his expectation? Jessica diagnosed the trouble 
as caused by a singular Jacobian matrix and modified the statement ‘dx 
= - jacob()\fx( :)’ in the routine “newtons ()” as follows. What solu¬ 
tion (to the problem in Example 4.3) do you get by using the modified 
routine, that is, by typing the same statement as in (b)? 

»rn2 = newtons('phys',1e6,1e-4,100), phys(rn2) 


J = jacob(f,xx(k,:),h,varargin{:}); 
if rank(J) < Nx 
k = k - 1; 

fprintf)‘Jacobian singular! det(J) = %12.6e\n 1 ,det(J)); break; 
else 

dx = -J\fx(:); %-[dfdx]"-1*fx; 
end 
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(d) To investigate how the accident of Jacobian singularity happened, add 
h = 1e-5 to the (tentative) solution (m2) obtained in (c). Does the 
result differ from m2? If not, why? (See Section 1.2.2 and Prob¬ 
lem 1.19.) 

»rn2 + 1e-5 -= m2 


(e) Charley thought that Jessica just circumvented the Jacobian singularity 
problem. To remove the source of singularity, he modified the formula 
(5.1.8) into 


D c 2 (x, h ) 


f{(\+h)x)~ f({\-h)x) 
2 hx 


(P5.5.3) 


and implemented it in another routine “jacobl ()” as follows. 


function g = jacobl(f,x,h,varargin) %Jacobian of f(x) 

if nargin<3, h =.0001; end 

h2 = 2*h; N = length(x); I = eye(N); 

for n = 1:N 

if abs(x(n))<.0001, x(n) =.0001; end 
delta = h*x(n); 
tmp = I(n,:)*delta; 
fl = feval(f,x + tmp,varargin{:}); 
f2 = feval(f,x - tmp,varargin{:}); 
f12 = (fl - f2)/2/delta; g(:,n) = f12(:); 
end 


With h = 1e-5 or h = 1e-6 and jacob() replaced by jacobl () in 
the routine “newtons ()”, type the same statement as in (c) to get a 
solution to the problem in Example 4.3 together with its residual error 
and check if his scheme works fine. 

»rn3 = newtons('phys',1e6,1e-4,100), phys(rn3) 


5.4 Numerical Integration of Basic Functions 

Compute the following integrals by using the trapezoidal rule, the Simp¬ 
son’s rule, and Romberg method and fill in the following table with the 
resulting errors. 

r 2 r n/2 pi 

(i) / (.r 3 — 2x) dx (ii) / sin x dx (iii) / e x dx 
Jo Jo Jo 
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N 

Trapezoidal 

Rule 

Simpson 

Rule 

Romberg 
(tol = 0.0005) 

J ( x 3 -2x)dx = 0 

4 


0.0000e+0 



6.2500e-l 



r*/2 

/ sin xdx = \ 

4 

1.2884e-2 


8.4345e-6 

Jo 



8.2955e-6 


J e~ x dx = 0.63212055883 

4 


1.3616e-5 


8 

8.2286e-4 




5.5 Adaptive Quadrature and Gaussian Quadrature for Improper Integral 
Consider the following two integrals. 


(i) 


(ii) 

(a) 


[ —= dx = 2x 1/2 | * = 2 (P5.5.1) 

Jo -Jx w 

f ^-dx= -^=dx + -^-dx = 2-2i (P5.5.2) 

J -i V x J -1 \ x Jo \ x 

Type the following statements into the MATLAB command window to 
use the integration routines for the above integral. What did you get? 
If something is wrong, what do you think caused it? 


»f = inline(' 1./sqrt(x) ' ,'x'); % define the integrand function 
»smpsns(f,0,1,100) % integral over [0,1] with 100 segments 
»rmbrg(f ,0,1, 1e-4) % with error tolerance = 0.0001 
»adapt_smpsn(f ,0,1,1 e-4) % with error tolerance = 0.0001 
»gauss_legendre(f ,0,1,20) %Gauss-Legendre with N = 20 grid points 
»quad(f,0,1) % MATLAB built-in routine 
»quad8(f ,0,1) % MATLAB 5.x built-in routine 
»adapt_smpsn(f,-1,1, 1e-4) %integral over [-1,1] 

»quad(f,-1,1) % MATLAB built-in routine 
»quadl(f, -1,1) % MATLAB built-in routine 


(b) Itha decided to retry the routine “smpsns () ”, but with the singular point 
excluded from the integration interval. In order to do that, she replaced 
the singular point (0) which is the lower bound of the integration inter¬ 
val [0,1] by 10~ 4 or 10 —5 , and typed the following statements into the 
MATLAB command window. 

»smpsns(f ,1e-4,1,100) 

»smpsns(f ,1e-5,1,100) 

»smpsns(f ,le-5,l ,le4) 

»smpsns(f ,le-4,l ,le3) 

»smpsns(f ,le-4,l ,le4) 
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What are the results? Will it be better if you make the lower-bound of 
the integration interval closer to zero (0), without increasing the number 
of segments or (equivalently) decreasing the segment width? How about 
increasing the number of segments without making the lower bound 
of the integration interval closer to the original lower-bound which is 
zero (0)? 

(c) For the purpose of improving the performance of “adap_smpsn()”, 
Vania would put the following statements into both of the routines 
“smpsnsO” and “adap_smpsn()”. Supplement the routines and check 
whether her idea works or not. 


EPS = 1e-12; fa = feval(f,a,varargin{:}); 

if isnan(fa)|abs(fa) == inf, a = a + max(abs(a)*EPS,EPS); end 
fb = feval(f,b,varargin{:}); 

?? ??????????????? ?? ????? ? ?? ? ? ???????????????????? ??? 


5.6 Various Numerical Integration Methods and Improper Integral 


Consider the following integrals. 



(P5.6.1) 

(P5.6.2) 


Note that the true values of these integrals can be obtained by using the 
symbolic computation command “into” as below. 


»syms x, int(sin(x)/x,0,inf) 
»int(exp(-x*2) ,0,inf) 


(cf) Don’t you believe it without seeing it? Blessed are those who have not seen 
and yet believe. 

(a) To apply the routines like “smpsnsO”, “adapt_smpsn()”, “Gauss_ 
LegendreO” and “quadl()” for evaluating the integral (P5.6.1), do 
the following. 

(i) Note that the integration interval [0, oo) can be changed into a 
finite interval as below. 
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(ii) Add the block of statements in P5.5(c) into the routines “smp- 
sns()” and “adap_smpsn( )” to make them cope with the cases 
of NaN (Not-a-Number) and Inf (Inf inity). 

(iii) Supplement the program “nm5p06a.m” so that the various routines 
are applied for computing the integrals (P5.6.1) and (P5.6.3), where 
the parameters like the number of segments (N = 200), the error 
tolerance (tol = le-4), and the number of grid points (MGL = 20) 
are supposed to be used as they are in the program. Noting that 
the second integrand function in (P5.6.3) oscillates like crazy with 
higher frequency and larger amplitude as y gets closer to zero (0), 
set the lower bound of the integration interval to a2 = 0.001. 

(iv) Run the supplemented program and fill in Table P5.6 with the 
absolute errors of the results. 



%nm5p06b 

warning off MATLAB:divideByZero 

fp56b = inline('exp(-x.*x)','x'); 

fp56b1 = inline) 1 ones(size(x))','x'); 

fp56b2 = inline('exp(-1./y./y)./y./y','y'); 

a = 0; b = 200; N = 200; tol = 1e-4; IT = sqrt(pi)/2; 

al = 0; bl = 1; a2 = 0; b2 = 1; MGH = 2; 

e_s = smpsns(fp56b,a,b,N)- IT 

e_as = adapt_smpsn(fp56b,a,b,tol)- IT 

e_q = quad(fp56b,a,b,tol)- IT 

e_GH = Gauss_Hermite(fp56bl,MGH)/2-lT 

e_ss = smpsns(fp56b,al,bl,N) + smpsns(fp56b2,a2,b2,N)- IT 
Iasas = adapt_smpsn(fp56b,al,bl,tol)+ ... 

+????????????????????????????? - IT 
e_qq = quad(fp56b,al,bl,tol)+????????????????????????? -IT 
warning off MATLAB:divideByZero 
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Table P5.6 Results of Applying Various Numerical Integration Methods for 
Improper Integrals 


(b) To apply the routines like “smpsns()”, “adapt_smpsn ()”, “quad ()”, and 
“Gauss_Hermite()’’for evaluating the integral (P5.6.2), do the folio wing, 
(i) Note that the integration interval [0, oo) can be changed into a 
finite interval as below. 



(ii) Compose the incomplete routine “Gauss_Hermite” like “Gauss_ 
Legendre”, which performs the Gauss-Hermite integration intro¬ 
duced in Section 5.9.2. 

(iii) Supplement the program “nm5p06b. m” so that the various routines 
are applied for computing the integrals (P5.6.2) and (P5.6.4), where 
the parameters like the number of segments (N = 200), the error 
tolerance (tol = le-4) and the number of grid points (MGH = 2) 
are supposed to be used as they are in the program. Note that the 
integration interval is not (—oo, oo) like that of Eq. (5.9.12), but 
[0, oo) and so you should cut the result of “Gauss_Hermite()” by 
half to get the right answer for the integral (P5.6.2). 

(iv) Run the supplemented program and fill in Table P5.6 with the 
absolute errors of the results. 

(c) Based on the results listed in Table P5.6, answer the following questions: 

(i) Among the routines “smpsnsO”, “adapt_smpsn()”, “quad()”, 
and “Gauss ()”, choose the best two ones for (P5.6.1) and (P5.6.2), 
respectively. 

(ii) The routine “Gauss-Legendre()” works (badly, perfectly) even 
with as many as 20 grid points for (P5.6.1), while the routine 
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“Gauss_Henmite()” works (perfectly, badly) just with two grid 
points for (P5.6.2). It is because the integrand function of (P5.6.1) 
is (far from, just like) a polynomial, while (P5.6.2) matches 
Eq. (5.9.11) and the part of it excluding e~ x is (just like, far 
from) a polynomial. 


function I = Gauss_Hermite(f,N,varargin) 

[t,w]=???????(N); 

ft = feval(f,t,varargin{:}); 

I = w*ft 1 ; 


(iii) Run the following program “nm5p06c. m” to see the shapes of the 
integrand functions of (P5.6.1) and (P5.6.2) and the second inte¬ 
gral of (P5.6.3). You can zoom in/out the graphs by clicking the 
Tools/Zoom_in menu and then clicking any point on the graphs 
with the left/right mouse button in the MATLAB graphic win¬ 
dow. Which one is oscillating furiously? Which one is oscillating 
moderately? Which one is just changing abruptly? 


%nm5p06c 

elf 

fp56a = inline('sin(x)./x','x'); 
fp56a2 = inline("sin(l./y)./y","y"); 
fp56b = inline('exp(-x.*x) 1 , 1 x'); 
xO = [eps:2000]/20; x = [eps:100]/100; 
subplot(221), plot(x0,fp56a(x0)) 
subplot(223), plot(x0,fp56b(x0)) 

subplot(222), y = logspace(-3,0,2000); loglog(y,abs(fp56a2(y))) 
subplot(224), y = logspace(-6,-3,2000); loglog(y,abs(fp56a2(y))) 


(iv) The adaptive integration routines like “adapt_smpsn()” and 
“quad()” work (badly, fine) for (P5.6.1), but (fine, badly) for 
(P5.6.2). From this fact, we might conjecture that the adaptive 
integration routines may be (ineffective, effective) for the integrand 
functions which have many oscillations, while they may be 
(effective, ineffective) for the integrand functions which have 
abruptly changing slope. To support this conjecture, run the 
following program “nm5p06d”, which uses the “quad()” routine 


with b = 100, 1000, 10000.... (P5.6.5a) 

with a = 0.001, 0.0001, 0.00001,.. ,(P5.6.5b) 


for the integrals 

f — dx 

J l * 

J 1 sin(f / y) ^ 
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%nm5p06d 

fp56a = inline('sin(x)./x 1 , 1 x 1 ); 
fp56a2 = inline) 1 sin(1./y)./y 1 , 1 y'); 
syms x 

IT2 = pi/2 - double(int(sin(x)/x,0,1)) %true value of the integral 
disp('Change of upper limit of the integration interval') 
a = 1; b = [100 1e3 1e4 le7]j tol = 1e-4; 
for i = 1:length(b) 

Iq2 = quad(fp56a,a,b(i),tol); 

fprintf('With b = %12.4e, err_Iq = %12.4e\n , b(i),Iq2-IT2); 

disp('Change of lower limit of the integration interval') 
a2 = [1e-3 1e-4 1e-5 1e-6 0]; b2 = 1; tol = 1e-4; 
for i = 1:5 

Iq2 = quad(fp56a2,a2(i),b2,tol); 

fprintf('With a2=%12.4e, err_Iq=%12.4e\n', a2(i),Iq2-IT2); 


Does the “quad()” routine work stably for (P5.6.5a) with the 
changing value of the upper-bound of the integration interval? 
Does it work stably for (P5.6.5b) with the changing value of the 
lower-bound of the integration interval? Do the results support or 
defy the conjecture? 

(cf) This problem warns us that it may be not good to use only one routine 
for a computational work and suggests us to use more than one method 
for cross check. 

5.7 Gauss-Hermite Integration Method 
Consider the following integral: 

f e~' 2 cosxdx = (P5.7.1) 

Jo 2 

Select a Gauss quadrature suitable for this integral and apply it with 
the number of grid points N = 4 as well as the routines “smpsnsO”, 
“adapt_smpsn()”, “quad()”, and “quadl()” to evaluate the integral. In 
order to compare the number of floating-point operations required to achieve 
almost the same level of accuracy, set the number of segments for Simpson 
method to N = 700 and the error tolerance for all other routines to tol = 
10~ 5 . Fill in Table P5.7 with the error results. 


Table P5.7 The Results of Applying Various Numerical Integration Methods 
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5.8 Gauss-Laguerre Integration Method 

(a) As in Section 5.9.1, Section 5.9.2, and Problem 5.6(b), compose the 
MATLAB routines: “LaguerpO”, which generates the Laguerre poly¬ 
nomial (5.9.18); “Gausslgp()”, which finds the grid point /,\s and the 
coefficient w N /s for Gauss-Laguerre integration formula (5.9.16); and 
“Gauss_Laguerre(f ,N)”, which uses these two routines to carry out 
the Gauss-Laguerre integration method. 

(b) Consider the following integral: 

jf e-'tdt =-e^t^ + e~ , dt = -e~ t |~ = 1 (P5.8.1) 

Noting that, since this integral matches Eq. (5.9.17) with f(t ) = 
t, Gauss-Laguerre method is the right choice, apply the routine 
“Gauss_Laguerre(f ,N)” (manufactured in (a)) with N = 2 as well as 
the routines “smpsnsO”, “adapt_smpsn()”, “quad()”, and “quadl()” 
for evaluating the integral and fill in Table P5.7 with the error results. 
Which turns out to be the best? Is the performance of “quad()” 
improved by lowering the error tolerance? 

(cf) This illustrates that the routine “adapt_smpsn()” sometimes outperforms the 
MATLAB built-in routine “quad()” with fewer computations. On the other 
hand, Table P5.7 shows that it is most desirable to apply the Gauss quadrature 
schemes only if one of them is applicable to the integration problem. 

5.9 Numerical Integrals 


Consider the following integrals. 

(1) f^ 2 x sin x dx = 1 (2) / (J ' x lnfsin jc) dx = —-7T 2 In 2 


;(1 — Inx) 2 


(3) Jo - 

(5) ^^)^ = | 
(7) = 

(9) / 0 °° x 2 e- x cos xdx = -\- 


(4) /r 
(6) it 


c(l + lnv ) 2 


v£(i + *) 


(8) / 0 °° *Jxe *dx = ^- 


(a) Apply the integration routines “smpsnsO” (with N = 10 4 ), “adapt_ 
smpsn()”, “quad()”, “quadl()” (tol = 10 -6 ) and “Gauss_leg- 
endref)” (Section 5.9.1) or “Gauss_Laguerre()” (Problem 5.8) (with 
N = 15) to compute the above integrals and fill in Table P5.9 with the 
relative errors. Use the upper/lower bounds of the integration interval in 
Table P5.9 if they are specified in the table. 

(b) Based on the results listed in Table P5.9, answer the following questions 
or circle the right answer. 
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(i) From the fact that the Gauss-Legendre integration scheme worked 
best only for (1), it is implied that the scheme is (recommendable, 
not recommendable) for the case where the integrand function is 
far from being approximated by a polynomial. 

(ii) From the fact that the Gauss-Laguerre integration scheme worked 
best only for (9), it is implied that the scheme is (recommendable, 
not recommendable) for the case where the integrand function 
excluding the multiplying term e~ x is far from being approximated 
by a polynomial. 

(iii) Note the following: 

• The integrals (3) and (4) can be converted into each other by a 
variable substitution of x = u 1 , dx = —u~ 2 du. The integrals 
(5) and (6) have the same relationship. 

• The integrals (7) and (8) can be converted into each other by a 
variable substitution of u = e~ x , dx = —u~ 1 du. 

From the results for (3)-(8), it can be conjectured that the numerical integra¬ 
tion may work (better, worse) if the integration interval is changed from [1, oo) 
into (0,1] through the substitution of variable like 

x = u~ n , dx = —nu~ (n+v> du or u = e~ nx , dx = —( nuY 1 du (P5.9.1) 


Table P5.9 
Integration IV 


>r Results of Applying Various Numerical 


>.7850e-02 (a = 10" 4 


.2702e-02 (a = IQ- 4 


5.10 The BER (Bit Error Rate) Curve of Communication with Multidimensional 
Signaling 

For a communication system with multidimensional (orthogonal) signaling, 
the BER—that is, the probability of bit error—is derived as 

Pe,b = (l - -^= J (Q M ~\-V2y - VbSNR))e~ y2 dy'j 


(P5.10.1) 
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where b is the number of bits, M = 2 b is the number of orthogonal wave¬ 
forms, SNR is the signal-to-noise-ratio, and Q(-) is the error function 
defined by 


Q(x) 


=j_ r 

y/2n Jx 


(P5.10.2) 


We want to plot the BER curves for SNR = 0:10[dB] and b = 1:4. 

(a) Consider the following program “nm5p10.m”, whose objective is to 
compute the values of P e ^(SNR,^) for SNR = 0:10[dB] and b = 1:4 
by using the routine “Gauss_Hermite()” (Problem 5.6) and also by 
using the MATLAB built-in routine “quad()” and to plot them versus 
SNR[dB] = 101og 10 SNR. Complete the incomplete part which com¬ 
putes the integral in (P5.10.1) over [—1000, 1000] and run the program 
to obtain the BER curves like Fig. P5.10. 

(b) Of the two routines, which one is faster and which one presents us with 
more reliable values of the integral in (P5.10.1)? 



%nm5p10.m: plots the probability of bit error versus SNRbdB 
fs ='Q(-sqrt(2)*x - sqrt(b*SNR)). A (2"b - 1) 1 ; 

Q = inline)'erfc(x/sqrt(2))/2 1 , 1 x'); 
f = inline(fs, 1 x', 1 SNR 1 , 1 b 1 ); 

fex2 = inline)[fs 1 .*exp(-x.*x) 1 ],'x 1 , 1 SNR 1 , 1 b 1 ); 

SNRdB = 0:10; tol = 1e-4; % SNR[dB] and tolerance used for 'quad' 

for b = 1:4 

tmp = 2 A (b - 1)/(2 A b - 1); spi = sqrt(pi); 
for i = 1:length(SNRdB), 

SNR = 10*(SNRdB(i)/10); 

Pe(i) = tmp*(1-Gauss_Hermite(f,10,SNR,b)/spi); 

Pel(i) = tmp*(1-quad(fex2,-10,10,tol,[],SNR,b)/spi); 

Pe2(i) = tmp*(1-?????????????????????????????????)/spi); 

semilogy(SNRdB,Pe, 'ko',SNRdB,Pel, 'b+: 1 ,SNRdB,Pe2, 1 r. - 1 ), hold on 
end 
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5.11 Length of Curve/Arc: Superb Harmony of Numerical Derivative/Integral. 

The graph of a function y = /(x) of a variable x is generally a curve and 
its length over the interval [a, b ] on the x-axis can be described by a line 
integral as 

/ = / dl = J sjdx 2 + dy 2 = J y/\ + ( dy/dx ) 2 dx 

= J Vi +(f'(x)) 2 dx (P5.11.1) 

For example, the length of the half-circumference of a circle with the radius 
of unit length can be obtained from this line integral with 

y = /(x) = Vl -x 2 , a = —1, b = 1 (P5.11.2) 

Starting from the program “nm5p11.m”, make a program that uses the 
numerical integration routines “smpsnsO”, “adapt_smpsn()”, “quad()”, 
“quadl()”, and “Gauss_Legendre( )” to evaluate the integral (P5.11.1,2) 
with the first derivative approximated by Eq. (5.1.8), where the parame¬ 
ters like the number of segments (N), the error tolerance (tol), and the 
number of grid points (M) are supposed to be as they are in the pro¬ 
gram. Run the program with the step size h = 0.001, 0.0001, and 0.00001 
in the numerical derivative and fill in Table P5.ll with the errors of the 
results, noting that the true value of the half-circumference of a unit circle 
is 71. 


%nm5p11 

a = -1; b = 1; % the lower/upper bounds of the integration interval 
N = 1000 % the number of segments for the Simpson method 
tol = 1e-6 % the error tolerance 

M = 20 % the number of grid points for Gauss-Legendre integration 
IT = pi; h = 1e-3 % true integral and step size for numerical derivative 
flength = inline('sqrt(1 + dfp511(x,h). ~2) ', 1 x', 1 h');%integrand P5.11.1) 
Is = smpsns(flength,a,b,N,h); 

[las,points,err] = adapt_smpsn(flength,a,b,tol,h); 

Iq = quad(flength,a,b,tol,[],h); 

Iql = quadl(flength,a,b,tol,[],h); 

IGL = Gauss_Legendre(flength,a,b,M,h); 


function df = dfp511(x,h) % numerical derivative of (P5.11.2) 

if nargin <2, h = 0.001; end 

df = (fp511(x + h)-fp511(x - h))/2/h; %Eq.(5.1.8) 


function y = fp511(x) 

y = sqrt(max(1-x.*x,0)); % the function (P5.11.2) 
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Table P5.11 Results of Applying Various Numerical Integration Methods for 
(P5.11.1,2)/(P5.12.1,2)_ 



Step-size h 

Simpson 

Adaptive 

quad 

quadl 

Gauss 

(P5.ll. 1,2) 

0.001 

4.6212e-2 


2.9822e-2 


8.4103e-2 

0.0001 


9.4278e-3 


9.4277e-3 


0.00001 

2.1853e-l 


2.9858e-3 


8.4937e-2 

(P5.12.1,2) 

0.001 


1.2393e-5 


1.3545e-5 


0.0001 

8.3626e-3 


5.0315e-6 


6.4849e-6 

0.00001 


1.3846e-9 


8.8255e-7 


(P5.13.1) 

N/A 

8.8818e-16 


0 


8.8818e-16 


5.12 Surface Area of Revolutionary 3-D (Cubic) Object 

The upper/lower surface area of a 3-D structure formed by one revolution of 
a graph (curve) of a function y = f(x ) around the jc-axis over the interval 
[a, b\ can be described by the following integral: 

I = 2n j ydl = 2n j f (xWl + (f (x)) 2 dx (P5.12.1) 

For example, the surface area of a sphere with the radius of unit length can 
be obtained from this equation with 

y = f(x) = Vl~x 2 , a = —1, b= 1 (P5.12.2) 

Starting from the program “nm5p11.m”, make a program “nm5p12.m” that 
uses the numerical integration routines “smpsnsO” (with the number of 
segments N = 1000), “adapt_smpsn()”, “quad()”, “quadl()” (with the 
error tolerance tol = 10 -6 ) and “Gauss_Legendre()” (with the number 
of grid points M = 20) to evaluate the integral (P5.12.1,2) with the first 
derivative approximated by Eq. (5.1.8), where the parameters like the num¬ 
ber of segments (N), the error tolerance (tol), and the number of grid points 
(M) are supposed to be as they are in the program. Run the program with 
the step size h = 0.001, 0.0001, and 0.00001 in the numerical derivative 
and fill in Table P5.ll with the errors of the results, noting that the true 
value of the surface area of a unit sphere is An . 

5.13 Volume of Revolutionary 3-D (Cubic) Object 

The volume of a 3-D structure formed by one revolution of a graph (curve) 
of a function y = f(x) around the x-axis over the interval [a, b\ can be 
described by the following integral: 

r b 

I = n I f 2 (x)dx 


(P5.13.1) 
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For example, the volume of a sphere with the radius of unit length (Fig. 
P5.13) can be obtained from this equation with Eq. (P5.12.2). Starting from 
the program “nm5p11. m”, make a program “nm5p13. m” that uses the numer¬ 
ical integration routines “smpsnsO” (with the number of segments N = 
100), “adapt_smpsn()”, “quad()’\ “quadl()” (with the error tolerance 
tol = 10 -6 ), and “Gauss_Legendre()” (with the number of grid points 
M = 2) to evaluate the integral (P5.13.1). Run the program and fill in 
Table P5.11 with the errors of the results, noting that the volume of a 
unit sphere is 47T/3. 



5.14 Double Integral 

(a) Consider the following double integral 


1= / / ysinxdx dy = / — ycosx\ n dy= / 2 ydy = 

Jo Jo Jo Jo 


(P5.14.1) 

Use the routine “int2s()” (Section 5.10) with M = N = 20, M = N = 
50 and M = N = 100 and the MATLAB built-in routine “dblquadO” 
to compute this double integral. Fill in Table P5.14.1 with the results 
and the times measured by using the commands tic/toe to be taken 
for carrying out each computation. Based on the results listed in 
Table P5.14.1, can we say that the numerical error becomes smaller 
as we increase the numbers (M,N) of segments along the x-axis and 
y-axis for the routine “int2s()”? 
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(b) Consider the following double integral: 

I=f [ —^—dxdy=^- (P5.14.2) 

Jo Jo 1 - xy 6 

Noting that the integrand function is singular at ( x , y) = (1, 1), use 
the routine “int2s()” and the MATLAB built-in routine “dblquadO” 
with the upper limit (d) of the integration interval along the y-axis d 
= 0.999, d = 0.9999, d = 0.99999 and d = 0.999999 to compute this 
double integral. Fill in Tables P5.14.2 and P5.14.3 with the results and 
the times measured by using the commands tic/toc to be taken for 
carrying out each computation. 


Table P5.14.1 Results of Running “int2s () ” and “dblquad () ” for (P5.14.1) 



int2s(), 

M = N = 20 

int2s(), 

M = N = 100 

int2s(), 

M = N = 200 

dblquadO 

|error| 


2.1649 x 1(T 8 


1.3250 x 10“ 8 

time 






Table P5.14.2 Results of Running “int2s ()” and “dblquad ()” for (P5.14.2) 


int2s() 

M = 2000 
N = 2000 


Table P5.14.3 Results of Running the Double Integral Routine “int2s()” for 
(P5.14.2) 



M = 1000, 

N = 1000 

M = 2000, 

N = 2000 

M = 5000, 

N = 5000 

^ 2 8! ) b = 1 
c = 0, d = 1 -10 4 

|error| 

0.0003 



time 





Based on the results listed in Tables P5.14.2 and P5.14.3, answer the 
following questions. 

(i) Can we say that the numerical error becomes smaller as we set the 
upper limit (d) of the integration interval along the y-axis closer to 
the true limit 1? 
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(ii) Can we say that the numerical error becomes smaller as we increase 
the numbers (M,N) of segments along the x-axis and y-axis for the 
routine “int2s()”? If this is contrary to the case of (a), can you 
blame the weird shape of the integrand function in Eq. (P5.14.2) 
for such a mess-up? 

(cf) Note that the computation times to be listed in Tables P5.14.1 to P5.14.3 
may vary with the speed of CPU as well as the computational jobs which 
are concurrently processed by the CPU. Therefore, the time measured by the 
‘tic/toc’ commands cannot be an exact estimate of the computational load 
taken by each routine. 

5.15 Area of a Triangle 

Consider how to find the area between the graph (curve) of a function /(x) 
and the x-axis. For example, let /(x) = x for 0 < x < 1 in order to find 
the area of a right-angled triangle with two equal sides of unit length. We 
might use either the 1-D integration or the 2-D integration—that is, the 
double integral for this job. 

(a) Use any integration method that you like best to evaluate the integral 



(b) Use any double integration routine that you like best to evaluate the 
integral 

I 2 = [ f f ldydx = [ [ Idydx (P5.15.2) 

Jo Jo Jo Jo 

You may get puzzled with some problem when applying the routine 
“int2s( )” if you define the integrand function as 

»fp515b = inline( 1 1 1 , 'x','y'); 

It is because this function, being called inside the routine 
“smpsns_fxy ()”, yields just a scalar output even for the vector-valued 
input argument. There are two remedies for this problem. One is to 
define the integrand function in such a way that it can generate the 
output of the same dimension as the input. 

»fp515b = inline(' 1+0*(x+y) ' , ' x','y ' ); 

But, this will cause a waste of computation time due to the dead multi¬ 
plication for each element of the input arguments x and y. The other is 
to modify the routine “smpsns_fxy ()” in such a way that it can avoid 
the vector operation. More specifically, you can replace some part of 
the routine with the following. But, this remedy also increases the com¬ 
putation time due to the abandonment of vector operation taking less 
time than scalar operation (see Section 1.3). 
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(cf) This problem illustrates that we must be provident to use the vector operation, 
especially in defining a MATLAB function. 


5.16 Volume of a Cone 

Likewise in Section 5.10, modify the program “nm510.m” so that it uses 
the routines “int2s ()” and “dblquad ()” to compute the volume of a cone 
that has a unit circle as its base side and a unit height, and run it to obtain 
the values of the volume up to four digits below the decimal point.) 







ORDINARY DIFFERENTIAL 
EQUATIONS 


Differential equations are mathematical descriptions of how the variables and 
their derivatives (rates of change) with respect to one or more independent 
variable affect each other in a dynamical way. Their solutions show us how 
the dependent variable(s) will change with the independent variable(s). Many 
problems in natural sciences and engineering fields are formulated into a scalar 
differential equation or a vector differential equation—that is, a system of dif¬ 
ferential equations. 

In this chapter, we look into several methods of obtaining the numerical solu¬ 
tions to ordinary differential equations (ODEs) in which all dependent variables 
(x) depend on a single independent variable (t). First, the initial value problems 
(IVPs) will be handled with several methods including Runge-Kutta method and 
predictor-corrector methods in Sections 6.1 to 6.5. The final section (Section 6.6) 
will introduce the shooting method and the finite difference method for solving 
the two-point boundary value problem (BVP). ODEs are called an IVP if the 
values x(t 0 ) of dependent variables are given at the initial point to of the inde¬ 
pendent variable, while they are called a BVP if the values x(to)/ x(tf ) are given 
at the initial/final points to and ?/. 


6.1 EULER’S METHOD 

When talking about the numerical solutions to ODEs, everyone starts with the 
Euler’s method, since it is easy to understand and simple to program. Even though 
its low accuracy keeps it from being widely used for solving ODEs, it gives us a 
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clue to the basic concept of numerical solution for a differential equation simply 
and clearly. Let’s consider a first-order differential equation: 

y'(t) + a y(t) = r with y(0) = y 0 (6.1.1) 


It has the following form of analytical solution: 

y(t) = (yo - e~ a ‘ + r - 


( 6 . 1 . 2 ) 


which can be obtained by using a conventional method or the Laplace trans¬ 
form technique [K-l, Chapter 5]. However, such a nice analytical solution does 
not exist for every differential equation; even if it exists, it is not easy to 
find even by using a computer equipped with the capability of symbolic com¬ 
putation. That is why we should study the numerical solutions to differential 
equations. 

Then, how do we translate the differential equation into a form that can eas¬ 
ily be handled by computer? First of all, we have to replace the derivative 
y'(t) = dy/dt in the differential equation by a numerical derivative (introduced in 
Chapter 5), where the step-size h is determined based on the accuracy require¬ 
ments and the computation time constraints. Euler’s method approximates the 
derivative in Eq. (6.1.1) with Eq. (5.1.2) as 


y(t + h)~ y{t) 
h 


+ a y{t) = r 


y(t + h) = (1 - ah)y(t) + hr with y(0) = y 0 


(6.1.3) 


and solves this difference equation step-by-step with increasing t by h each time 
from t = 0. 

y(h) = (1 - ah)y{ 0) + hr = (1 - ah)y 0 + hr 

y(2h) = (1 - ah)y{h) + hr = (1 - ah) 2 y 0 + (1 - ah)hr + hr (6.1.4) 

y(3h) = (1 - ah)y(2h) + hr = (1 - ah) 3 y 0 + £LoO - ah) m hr 


This is a numeric sequence {y(kh)}, which we call a numerical solution of 
Eq. (6.1.1). 

To be specific, let the parameters and the initial value of Eq. (6.1.1) be a = 1, 
r = 1, and yo = 0- Then, the analytical solution (6.1.2) becomes 


y(t 1 = i - 


( 6 . 1 . 5 ) 
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%nm610: Euler method to solve a Ist-order differential equation 
clear, elf 

a = 1 ; r = 1; yO = 0; tf = 2; 

t = [0:0.01:tf]; yt = 1 - exp(-a*t); %Eq.(6.1.5): true analytical solution 

plot(t,yt, 1 k 1 ), hold on 

klasts = [842]; hs = tf./klasts; 

y (i) = yo; 

for itr = 1:3 %with various step size h = 1/8,1/4,1/2 
klast = klasts(itr); h = hs(itr); y(1)=y0; 
for k = 1:klast 

y(k + 1) = (1 - a*h)*y(k) +h*r; %Eq.(6.1.3): 
plot([k - 1 k]*h,[y(k) y(k+1)],'b', k*h,y(k+1),'ro') 


and the numerical solution (6.1.4) with the step-size h = 0.5 and h = 0.25 are 
as listed in Table 6.1 and depicted in Fig. 6.1. We make a MATLAB program 
“nm610.m”, which uses Euler’s method for the differential equation (6.1.1), actu¬ 
ally solving the difference equation (6.1.3) and plots the graphs of the numerical 
solutions in Fig. 6.1. The graphs seem to tell us that a small step-size helps 
reduce the error so as to make the numerical solution closer to the (true) ana¬ 
lytical solution. But, as will be investigated thoroughly in Section 6.2, it is only 
partially true. In fact, a too small step-size not only makes the computation time 
longer (proportional as 1 /h), but also results in rather larger errors due to the 
accumulated round-off effect. This is why we should look for other methods to 
decrease the errors rather than simply reduce the step-size. 

Euler’s method can also be applied for solving a first-order vector differential 
equation 

y'(t) = f(f, y) with y(t 0 ) = y 0 (6.1.6) 

which is equivalent to a high-order scalar differential equation. The algorithm 
can be described by 

y-t+i =y* + hf(t k , y k ) with y(r 0 ) = y 0 (6.1.7) 


Table 6.1 A Numerical Solution of the Differential Equation (6.1.1) Obtained by the 
Euler’s Method 


0.25 

0.50 

0.75 

1.00 

1.25 

1.50 


y(0.50) = (1 - ah)yo + hr = 1/2 = 0.5 


V(1.00) = (l/2)y(0.5) + 1/2 = 3/4 = 0.75 


y(1.50) = (l/2)y(1.0) + 1/2 = 7/8 = 0.875 


y(0.25) = (1 - ah)y 0 +hr= 1/4 = 0.25 
y(0.50) = (3/4)y(0.25) + 1/4 = 0.4375 
y(0.75) = (3/4)y(0.50) + 1/4 = 0.5781 
y(1.00) = (3/4)y(0.75) + 1/4 = 0.6836 
y(1.25) = (3/4)y(1.00) + 1/4 = 0.7627 
y(1.50) = (3/4)y(1.25) + 1/4 = 0.8220 
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Figure 6.1 Examples of numerical solution obtained by using the Euler’s method. 


and is cast into the MATLAB routine “ode_Euler ()”. 


function [t,y] = ode_Euler(f,tspan,yO,N) 

%Euler's method to solve vector differential equation y 1 (t) = f(t,y(t)) 
% for tspan = [tO,tf] and with the initial value yO and N time steps 
if nargin<4 | N <= 0, N = 100; end 
if nargin<3, yO = 0; end 
h = (tspan(2) - tspan(1))/N; %stepsize 
t = tspan(1)+[0:N]'*h; %time vector 

y(1,:) = yO(:)'; %always make the initial value a row vector 
for k = 1:N 

y(k + 1,:) = y(k,:) +h*feval(f,t(k),y(k,:)); %Eq.(6.1.7) 
end 


6.2 HEUN’S METHOD: TRAPEZOIDAL METHOD 

Another method of solving a first-order vector differential equation like Eq. (6.1.6) 
comes from integrating both sides of the equation. 

y'(t) = f (t, y), y(0l‘* +1 = y(4+i) - y(4) = f f(4 y) dt 

Jt k 

y(4+l) = y(4) + ^ f (t, y) dt with y(to) = y 0 (6.2.1) 

If we assume that the value of the (derivative) function f(t,y) is constant 
as f(4,y(4)) within one time step [ 4 , 4 + 1 ), this becomes Eq. (6.1.7) (with h = 
4+i — 4), amounting to Euler’s method. If we use the trapezoidal rule (5.5.3), it 
becomes 


y*+i = y* + :r{f(4, y*) + f(4+i,yt+t)} 


(6.2.2) 
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function [t,y] = ode_Heun(f,tspan,yO,N) 

%Heun method to solve vector differential equation y 1 (t) = f(t,y(t)) 
% for tspan = [tO,tf] and with the initial value yO and N time steps 
if nargin<4 | N <= 0, N = 100; end 
if nargin<3, yO = 0; end 
h = (tspan(2) - tspan(1))/N; %stepsize 
t = tspan(1)+[0:N] 1 *h; %time vector 

y (1,:) = y0(:)'; %always make the initial value a row vector 
for k = 1:N 

fk = feval(f,t(k),y(k,:)); y(k+1,:) = y(k,:)+h*fk; %Eq.(6.2.3) 
y(k+1,:) = y(k,:) +h/2*(fk +feval(f,t(k+1),y(k+1,:))); %Eq.(6.2.4) 
end 


But, the right-hand side (RHS) of this equation has y* + i, which is unknown at 
4 . To resolve this problem, we replace the j 4 + i on the RHS by the following 
approximation: 

yk+i = y k + hf(t k , y k ) (6.2.3) 


so that it becomes 


y*+i = fk + ^{f(4, y k ) + f(t k+ i,y k + hf(t k , y fc ))} (6.2.4) 

This is Heun’s method, which is implemented in the MATLAB routine 
“ode_Heun( )”. It is a kind of predictor-and-corrector method in that it predicts 
the value of y^+i by Eq. (6.2.3) at 4 and then corrects the predicted value by 
Eq. (6.2.4) at t k+ \. The truncation error of Heun’s method is 0(h 2 ) (proportional 
to h 2 ) as shown in Eq. (5.6.1), while the error of Euler’s method is 0(h). 


6.3 RUNGE-KUTTA METHOD 

Although Heun’s method is a little better than the Euler’s method, it is still not 
accurate enough for most real-world problems. The fourth-order Runge-Kutta 
(RK4) method having a truncation error of O (h 4 ) is one of the most widely used 
methods for solving differential equations. Its algorithm is described below. 


y*+i — y* + + 242 + 2f*3 + f*4) (6.3.1) 

where 

f«=f(4,y0 (6.3.2a) 

i kl = f (4 + h/ 2, y k + f k ih/2) (6.3.2b) 

f*3 = f(4 + h/2, y k + f kl h/2) (6.3.2c) 

f M = f (4 + h,y k + f kk h) (6.3.2d) 
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function [t,y] = ode_RK4(f,tspan,yO,N,varargin) 

%Runge-Kutta method to solve vector differential eqn y'(t) = f(t,y(t)) 

% for tspan = [tO,tf] and with the initial value yO and N time steps 

if nargin < 4 | N <= 0, N = 100; end 

if nargin <3, yO = 0; end 

y (1,:) = y0(:) 1 ; %make it a row vector 

h = (tspan(2) - tspan(1))/N; t = tspan(1)+[0:N] 1 *h; 

for k = 1:N 

fl = h*feval(f,t(k),y(k,:),varargin{:}); fl = f1(:)'; %(6.3.2a) 
f2 = h*feval(f,t(k) + h/2,y(k,:) + f1/2,varargin{:}); f2 = f2(:)';%(6.3.2b) 
f3 = h*feval(f,t(k) + h/2,y(k,:) + f2/2,varargin{:}); f3 = f3(:)';%(6.3.2c) 
f4 = h*feval(f,t(k) + h,y(k,:) + f3,varargin{:}); f4 = f4(:)'; %(6.3.2d) 
y(k + 1,:) = y(k,:) + (fl + 2*(f2 + f3) + f4)/6; %Eq.(6.3.1) 
end 


%nm630: Heun/Euer/RK4 method to solve a differential equation (d.e.) 
clear, elf 
tspan = [02]; 

t = tspan(1)+[0:100]*(tspan(2) - tspan(1))/100; 
a = 1; yt = 1 - exp(-a*t); %Eq.(6.1.5): true analytical solution 
plot(t,yt,'k 1 ), hold on 

df61 = inline)' -y + I'/t'/y'); %Eq .(6.1.1): d.e. to be solved 
yO = 0; N = 4; 

[t1,ye] = oed_Euler(df61,tspan,yO,N); 

[t1,yh] = ode_Heun(df61,tspan,yO,N); 

[t1,yr] = ode_RK4(df61,tspan,yO,N); 

plot(t,yt, 1 k 1 , tl,ye, 1 b: 1 , t1,yh,'b: 1 , t1,yr,'r:') 

plot(t1,ye,'bo 1 , tl,yh,'b+ 1 , t1,yr,'r*') 

N = 1e3; %to estimate the time for N iterations 
tic, [tl,ye] = ode_Euler(df61,tspan,yO,N); time_Euler = toe 
tic, [t1,yh] = ode_Heun(df61,tspan,yO,N); time_Heun = toe 
tic, [tl,yr] = ode_RK4(df61,tspan,yO,N); time_RK4 = toe 


Equation (6.3.1) is the core of RK4 method, which may be obtained by sub¬ 
stituting Simpson’s rule (5.5.4) 

£ f(x ) dx = \{f k + 4/*+i /2 + /*+ 1 ) with h! = * k+1 ^ %k = ^ 

(6.3.3) 

into the integral form (6.2.1) of differential equation and replacing /jt+i /2 with 
the average of the successive function values (/i 2 + /i3)/2. Accordingly, the 
RK4 method has a truncation error of 0(h A ) as Eq. (5.6.2) and thus is expected 
to work better than the previous two methods. 

The fourth-order Runge-Kutta (RK4) method is cast into the MATLAB rou¬ 
tine “ode_RK4()”. The program “nm630.m” uses this routine to solve Eq. (6.1.1) 
with the step size h = (tf — tf)/N = 2/4 = 0.5 and plots the numerical result 
together with the (true) analytical solution. Comparison of this result with those of 
Euler’s method (“ode_Euler ()”) and Heun’s method (“ode_Heun ()”) is given in 
Fig. 6.2, which shows that the RK4 method is better than Heun’s method, while 
Euler’s method is the worst in terms of accuracy with the same step-size. But, 
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in terms of computational load, the order is reversed, because Euler’s method, 
Heun’s method, and the RK4 method need 1, 2, and 4 function evaluations (calls) 
per iteration, respectively. 

(cf) Note that a function call takes much more time than a multiplication and thus the 
number of function calls should be a criterion in estimating and comparing compu¬ 
tational time. 

The MATLAB built-in routines “ode23()” and “ode45()” implement the 
Runge-Kutta method with an adaptive step-size adjustment, which uses a 
large/small step-size depending on whether /(?) is smooth or rough. In 
Section 6.4.3, we will try applying these routines together with our routines to 
solve a differential equation for practice rather than for comparison. 


6.4 PREDICTOR-CORRECTOR METHOD 
6.4.1 Adams-Bashforth-Moulton Method 

The Adams-Bashforth-Moulton (ABM) method consists of two steps. The first 
step is to approximate f(t,y) by the (Lagrange) polynomial of degree 4 matching 
the four points 


{(4-3, 4-3), (4-2, 4-2), (4-i, 4-t), (4,4)} 

and substitute the polynomial into the integral form (6.2.1) of differential equation 
to get a predicted estimate of y* + 1 . 

Pt +1 = y* + J hit) dt = y k + ^(—94-3 + 374-2 - 594-1 + 554) 

(6.4.1a) 
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The second step is to repeat the same work with the updated four points 
{(4-2, ft— 2 ), (4-i, 4-i), (4,4), (4+i, ft+i)} (4+i = f(4+i, Pt+i)) 
to get a corrected estimate of yt+i- 

<4+i = y k + J 4(0 dt = y k + ~~ ( 4—2 - 5ft_, + 19ft + 9ft+0 (6.4.1b) 

The coefficients of Eqs. (6.4.1a) and (6.4.1b) can be obtained by using the 
MATLAB routines “lagranpO” and “polyint()”, each of which generates 
Lagrange (coefficient) polynomials and integrates a polynomial, respectively. 
Let’s try running the program “ABMc.m”. 

»abmc 

CAP = -3/8 37/24 -59/24 55/24 

CAC = 1/24 -5/24 19/24 3/8 


%ABMc.m 

% Predictor/Corrector coefficients in Adams-Bashforth-Moulton method 
format rat 

[ 1,L] = lagranp([-3 -2 -1 0],[0 0 0 0]); %only coefficient polynomial L 
for m = 1:4 

iL = polyint(L(m,:)); %indefinite integral of polynomial 
cAP(m) = polyval(iL,1)-polyval(iL,0); %definite integral over [0,1] 
end 

CAP %Predictor coefficients 

[1,L] = lagranp([-2 -1 0 1],[0 0 0 0]); %only coefficient polynomial L 
for m = 1:4 

iL = polyint(L(m,:)); %indefinite integral of polynomial 
cAC(m) = polyval(iL,1) - polyval(iL,0); %definite integral over [0,1] 
end 

cAC %Corrector coefficients 
format short 


Alternatively, we write the Taylor series expansion of y^+i about 4 and that 
of y* about 4+1 as 


y*M = y* + Aft + y 4 + ^ff + ^ff + + ■ ■ ■ (6.4.2a) 

yi = yt+i - hi k+ 1 + yf t+1 - yj-fi+i + ^-4+i - gjC + ■ ■ ■ 
im = y* + ht k+l - + £c - + Sc - (6.4.2b) 


and replace the first, second, and third derivatives by their difference approxi¬ 
mations. 
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y*+i = y, + « + % ( -^ + l<V- 2 -3i»-, + ^ + i„3 C + 
2 V h 4 

, h 3 f -ft-3 + 4ffc-2 - 5f,_, + 2ft , 11, 2,(4) , \ 

+ 3! ^ h 2 12 * + "7 

/ -ft-3 + 3ft- 2 -3ft- 1+ ft 3 (4) \ *1,(4; 

+ 4! ^ + 2 kh + J + 120 4 

= y* + ^(- 9 ^-3 + 37f,_ 2 - 59ft_i + 554) + + ••• 


y*+i = y* + hi k+ i 




/I 3 / -ft-2 + 4ft-1 - 5ft + 2ft+l ,11, 2if( 4) , ^ 

3! V ^ 12 *+' ) 

h A (- ft-2+ 3f*_i-3f*+ f t+ i , 3 (4) , \ , h 5 (4) , 

' 4! (-P- + 2<' + ' j + 120 f ‘« + ’ 

19 , 


(6.4. 


C " +1 720 /?5f ^ 1 


(6.4.3b) 

These derivations are supported by running the MATLAB program “ABMcl .m”. 


%ABMc1.m 

%another way to get the ABM coefficients together with the error term 
clear, format rat 

for i = 1:3, [ci.erri] = difapx(i,[-3 0]); c(i,:) = ci; err(i) = erri; 

CAP =[000 1]+[1/2 1/6 1/24]*c, errp = -[1/2 1/6 1/24]*err' + 1/120 
cAC =[000 1]+[-1/2 1/6 -1/24]*c, errc = -[-1/2 1/6 -1/24]*err' + 1/120 
format short 


From these equations and under the assumption that f A <4) j = f^ 4) = K, we can 
write the predictor/corrector errors as 


Ep,k+ 1 = yt+i — Pt+t 
Ec,k +1 = yt+i — c k +1 


720 

19 


■h 5 t™ =-—Kh 5 


(6.4.4a) 


(6.4.4b) 
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We still cannot use these formulas to estimate the predictor/corrector errors, since 
K is unknown. But, from the difference between these two formulas 


^ „ J70 fi5 _ 2 70 p _ 270 

Ep,k+i - E c ,k +1 = c*+i - pfc+i = —Kh = — Ept+i = —j ~^E c ,k 

(t 

e can get the practical formulas for estimating the errors as 


(6.4.6a) 

(6.4.6b) 


These formulas give us rough estimates of how close the predicted/corrected 
values are to the true value and so can be used to improve them as well as to 
adjust the step-size. 

251 

Pk+i -* Pa i i + — (c* - Pt) =>■ m*+i (6.4.7a) 

19 

C*+1 -> c*+i - —fe+i - p*+i) =>• y^+1 (6.4.7b) 


These modification formulas are expected to reward our efforts that we have 
made to derive them. 

The Adams-Bashforth-Moulton (ABM) method with the modification formu¬ 
las can be described by Eqs. (6.4.1a), (6.4.1b), and (6.4.7a), (6.4.7b) summarized 
below and is cast into the MATLAB routine “ode_ABM()”. This scheme needs 
only two function evaluations (calls) per iteration, while having a truncation 
error of 0(h 5 ) and thus is expected to work better than the methods discussed so 
far. It is implemented by the MATLAB built-in routine “ode113()” with many 
additional sophisticated techniques. 


(Adams-Bashforth-Moulton method with modification formulas) 


Predictor: 

P*+t = y* + ^(-94-3 + 374-2 - 594-1 + 554) 

(6.4.8a) 

Modifier: 

251 

m <r+l = P*+l + Yj() {Xk ~ PA ' ) 

(6.4.8b) 

Corrector: 

C/, + i = y* +(4-2 — 54-i + 194 + 9f(4+i, m t+ i)) 

(6.4.8c) 


19 , 

y*+i = c k+ 1 - — (c *+1 - p*+i) 

(6.4.8d) 
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function [t,y] = ode_ABM(f,tspan,yO,N,KC,varargin) 

%Adams-Bashforth-Moulton method to solve vector d.e. y'(t) = f(t,y(t)) 

% for tspan = [tO,tf] and with the initial value yO and N time steps 
% using the modifier based on the error estimate depending on KC = 1/0 
if nargin <5, KC = 1; end %with modifier by default 

if nargin <4|N<=0, N = 100; end %default maximum number of iterations 
yO = yO(:) 1 ; %make it a row vector 
h = (tspan(2) - tspan(1))/N; %step size 
tspanO = tspan(1)+[0 3]*h; 

[t,y] = rk4(f,tspanO,yO,3,varargin{:}); %initialize by Runge-Kutta 
t = [t(1:3) 1 t(4):h:tspan(2)] 1 ; 

for k = 1:4, F(k,:) = feval(f,t(k),y(k,:),varargin{:}); end 
p = y(4,:); c = y(4,:); KC22 = KC*251/270; KC12 = KC*19/270; 
h24 = h/24; h241 = h24*[1 -5 19 9]; h249 = h24*[-9 37 -59 55]; 
for k = 4:N 

pi = y(k,:) +h249*F; %Eq.(6.4.8a) 
ml = pkl + KC22*(c-p); %Eq.(6.4.8b) 
cl = y(k,:)+ ... 

h241*[F(2:4,:); feval(f,t(k + 1),m1,varargin{:})]; %Eq.(6.4.8c) 
y(k + 1,:) = cl - KC12*(c1 - pi); %Eq.(6.4.8d) 
p = pi; c = cl; %update the predicted/corrected values 
F = [F(2:4,:); feval(f,t(k + 1),y(k + 1,:),varargin{:})]; 


6.4.2 Hamming Method 


function [t,y] = ode_Ham(f,tspan,yO,N,KC,varargin) 

% Hamming method to solve vector d.e. y 1 (t) = f(t,y(t)) 

% for tspan = [tO,tf] and with the initial value yO and N time steps 
% using the modifier based on the error estimate depending on KC = 1/0 
if nargin <5, KC = 1; end %with modifier by default 

if nargin < 4 | N <= 0, N = 100; end %default maximum number of iterations 
if nargin < 3, yO = 0; end %default initial value 
yO = yO(:) 1 ; end %make it a row vector 
h = (tspan(2)-tspan(1))/N; %step size 
tspanO = tspan(1)+[0 3]*h; 

[t,y] = ode_RK4(f,tspanO,yO,3,varargin{:}); %Initialize by Runge-Kutta 
t = [t(1:3) 1 t(4):h:tspan(2)] 1 ; 

for k = 2:4, F(k - 1,:) = feval(f,t(k),y(k,:),varargin{:}); end 
p = y(4,:); c = y(4,:); h34 = h/3*4; KC11 = KC*112/121; KC91 = KC*9/121; 
h312 = 3*h*[-1 21]; 
for k = 4:N 

pi = y(k - 3,:) + h34*(2*(F(1,:) + F(3,:)) - F(2,:)); %Eq.(6.4.9a) 
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(Hamming method with modification formulas) 


Predictor: 

Ah 

Pa+i = fk -3 + —(24-2 - 4-i + 24) 

(6.4.9a) 

Modifier: 

112 

™*+i = Pt +1 + Y^j-( C A - Pa) 

(6.4.9b) 

Corrector: 

ca+1 = ^{9yA - yA-2 + 3A(—4-i + 24 + f(4+i> m* +1 ))}(6.4.9c) 


9 

yt+i = c a+i - ( c *+i - P*+i) 

(6.4.9d) 


In this section, we introduce just the algorithm of the Hamming method [H-l] 
summarized in the box above and the corresponding routine “ode_Ham()”, which 
is another multistep predictor-corrector method like the Adams-Bashforth- 
Moulton (ABM) method. 

This scheme also needs only two function evaluations (calls) per iteration, 
while having the error of 0(h 5 ) and so is comparable with the ABM method 
discussed in the previous section. 

6.4.3 Comparison of Methods 

The major factors to be considered in evaluating/comparing different numeri¬ 
cal methods are the accuracy of the numerical solution and its computation 
time. In this section, we will compare the routines “ode_RK4()”, “ode_ABM()”, 
“ode_Ham()”, “ode23()”, “ode45()”, and “ode113()” by trying them out on 
the same differential equations, hopefully to make some conjectures about their 
performances. It is important to note that the evaluation/comparison of numer¬ 
ical methods is not so simple because their performances may depend on the 
characteristic of the problem at hand. It should also be noted that there are other 
factors to be considered, such as stability, versatility, proof against run-time 
error, and so on. These points are being considered in most of the MATLAB 
built-in routines. 

The first thing we are going to do is to validate the effectiveness of the mod¬ 
ifiers (Eqs. (6.4.8b,d) and (6.4.9b,d)) in the ABM (Adams-Bashforth-Moulton) 
method and the Hamming method. For this job, we write and run the program 
“nm643_1 .m” to get the results depicted in Fig. 6.3 for the differential equation 

y'(t) = —y(t) + 1 with y(0) = 0 (6.4.10) 

which was given at the beginning of this chapter. Fig. 6.3 shows us an interesting 
fact that, although the ABM method and the Hamming method, even without 
modifiers, are theoretically expected to have better accuracy than the RK4 (fourth- 
order Runge-Kutta) method, they turn out to work better than RK4 only with 
modifiers. Of course, it is not always the case, as illustrated in Fig. 6.4, which 
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(al) Numerical solutions without modifiers (bl) Relative errors without modifiers 



(a2) Numerical solutions with modifiers (b2) Relative errors with modifiers 

Figure 6.3 Numerical solutions and their errors for the differential equation y'(f) = -y(f) + 1. 



(al) Numerical solutions without modifiers (bl) Relative errors without modifiers 



(a2) Numerical solutions with modifiers 



(b2) Relative errors with modifiers 



(a3) Numerical solutions by ode23, (b3) Their relative errors 

ode45,odell3 


Figure 6.4 Numerical solutions and their errors for the differential equation /(f) = y(t) 
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we obtained by applying the same routines to solve another differential equation 
y \t) = y(t)+ 1 with y(0) = 0 (6.4.11) 

where the true analytical solution is 

y(t) = e'-\ (6.4.12) 


%nm643_1: RK4/Adams/Hamming method to solve a differential eq 
clear, elf 

to = 0; tf = 10; yO = 0; %starting/final time, initial value 
N = 50; %number of segments 

df643 = inline( 1 -y+1 1 , 1 t 1 , 1 y'); %differential equation to solve 
f643 = inline( 1 1-exp(-t) 1 , 1 t 1 ); %true analytical solution 
for KC = 0:1 

tic, [t1,yR] = ode_RK4(df643,[to tf],y0,N); tR = toe 
tic, [t1,yA] = ode_ABM(df643,[tO tf],yO,N,KC); tA = toe 
tic, [t1,yH] = ode_Ham(df643,[to tf],yO,N,KC); tH = toe 
ytl = f643(t1); %true analytical solution to plot 
subplot(221 + KC*2) %plot analytical/numerical solutions 
plot(t1,yt1, 1 k 1 , tl,yR, 1 k', t1,yA,'k--', t1,yH,'k:') 
tmp = abs(yt1)+eps; l_t1 = length(tl); 
eR = abs(yR - ytl)./tmp; e_R=norm(eR)/ltl 
eA = abs(yA - ytl)./tmp; e_A=norm(eA)/ltl 
eH = abs(yH - ytl)./tmp; e_H=norm(eH)/ltl 
subplot(222 + KC*2) %plot relative errors 
plot( 1 1 , eR , 1 k 1 , tl ,eA, 1 k-- 1 , tl , eH,'k:') 
end 


%nm643_2: ode23()/ode45()/odel13() to solve a differential eq 
clear, elf 

to = 0; tf = 10; yO = 0; N = 50; %starting/final time, initial value 

df643 = inline('y + %differential equation to solve 

f643 = inline('exp(t) - 1 ', 1 1'); %true analytical solution 

tic, [tl,yR] = ode_RK4(df643,[to tf],yO,N); time(1) = toe; 

tic, [tl,yA] = ode_ABM(df643,[to tf],yO,N); time(2) = toe; 

ytl = f643(t1); 

tmp = abs(yt1)+ eps; l_t1 = length(tl); 

eR = abs(yR-yt1)./tmp; err(1) = norm(eR)/l_t1; 

eA = abs(yA-yt1)./tmp; err(2) = norm(eA)/l_t1; 

options = odeset('RelTol',1e-4); %set the tolerance of relative error 
tic, [t23,yode23] = ode23(df643,[to tf],y0,options); time(3) = toe; 
tic, [t45,yode45] = ode45(df643,[to tf],y0,options); time(4) = toe; 
tic, [1113,yodel 13] = ode113(df643, [to tf],y0,options); time(5) = toe; 
yt23 = f643(t23); tmp = abs(yt23) + eps; 

eode23 = abs(yode23-yt23)./tmp; err(3) = norm(eode23)/length(t23); 
yt45 = f643(t45); tmp = abs(yt45) + eps; 

eode45 = abs(yode45 - yt45)./tmp; err(4) = norm(eode45)/length(t45); 
ytl13 = f643(t113); tmp = abs(yt113) + eps; 

eode113 = abs(yode113 - ytl13)./tmp; err(5) = norm(eode113)/length(t113); 
subplot(221), plot(t23,yode23,'k 1 , t45,yode45, 1 b', tl13,yodel 13, 1 r 1 ) 
subplot(222), plot(t23,eode23,'k 1 , t45,eode45, 1 b-- 1 , tl13,eodel13, 1 r:') 
err, time 
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Table 6.2 Results of Applying Several Routines to solve a Simple Differential Equation 


ode_RK4() ode_ABM() ode_Ham() ode23() ode45() odell3() 


Relative error 0.0925 x 10" 4 0.0203 x 10" 4 0.0179 x 10" 4 0.4770 x 10" 4 0.0422 x 10" 4 0.1249 x 10" 4 
Computing time 0.05 sec 0.03 sec 0.03 sec 0.07 sec 0.05 sec 0.05 sec 


Readers are invited to supplement the program “nm643_2. m” in such a way 
that “ode_Ham( )” is also used to solve Eq. (6.4.11). Running the program yields 
the results depicted in Fig. 6.4 and listed in Table 6.2. From Fig. 6.4, it is note¬ 
worthy that, without the modifiers, the ABM method seems to be better than the 
Hamming method; however, with the modifiers, it is the other way around or at 
least they run a neck-and-neck race. Anyone will see that the predictor-corrector 
methods such as the ABM method (ode_ABM( )) and the Hamming method 
(ode_Ham( )) give us a better numerical solution with less error and shorter com¬ 
putation time than the MATFAB built-in routines “ode23()”, “ode45()”, and 
“ode113()” as well as the RK4 method (ode_RK4()), as listed in Table 6.2. But, 
a general conclusion should not be deduced just from one example. 


6.5 VECTOR DIFFERENTIAL EQUATIONS 
6.5.1 State Equation 

Although we have tried using the MATLAB routines only for scalar differential 
equations, all the routines made by us or built inside MATLAB are ready to 
entertain first-order vector differential equations, called state equations, as below. 

x\{t) = ...) with xi(fo) = *io 

x 2 '(t) = f 2 (t, A-j(f), x 2 (t ),...) with x 2 (t (i ) = A 20 


x'it) = f it, x(t)) with x(t 0 ) = x 0 (6.5.1) 


For example, we can define the system of first-order differential equations 


Ai'(f) = a lit) with Ai(0) = 1 

xf if) = —xiit) + 1 with A 2 (0) = — 1 


(6.5.2) 


in a file named “df651.m” and solve it by running the MATLAB program 
“nm651_1 .m”, which uses the routines “ode_Ham()”/“ode45()” to get the 
numerical solutions and plots the results as depicted in Fig. 6.5. Note that the 
function given as the first input argument of “ode45()” must be fabricated to 
generate its value in a column vector or at least, in the same form of vector as 
the input argument ‘x’ so long as it is a vector-valued function. 
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%nm651_1 to solve a system of differential eqs., i.e., state equation 
df = 'df651'; 

tO = 0; tf = 2; xQ = [1 -1]; %start/final time and initial value 

N = 45; [tH,xH] = ode_Ham(df,[tO tf],xO,N); %with N = number of segments 

[t45,x45] = ode45(df,[to tf],xO); 

plot(tH,xH), hold on, pause, plot(t45,x45) 


function dx = df651(t,x) 

dx = zeros(size(x)); %row/column vector depending on the shape of x 
dx(1) = x(2); dx(2) = -x(2) + 1; 


Especially for the state equations having only constant coefficients like Eq. 
(6.5.2), we can change it into a matrix-vector form as 

wilh [£( 0 >] = [-l] “0“-w = iv,>0 

x'(t) = Ax{t) + Bu(t) with the initial state x(0) and the input u(t) (6.5.4) 


which is called a linear time-invariant (LTI) state equation, and then try to find 
the analytical solution. For this purpose, we take the Laplace transform of both 
sides to write 


sX(s) - x(0) = AX(s) + BU(s ) with X(s) = L{x(t)}, U(s ) = L{u{t)} 

[si - A]X(s) = x(0) + BU(s), X(s) = [si - A] _1 x(0) + [si - A]- l BU(s) 

(6.5.5) 

where L[x(t)} and L _1 {X(y)} denote the Laplace transform of x(l) and the 
inverse Laplace transform of X (s), respectively. Note that 

[si - A]-' = s-'[I - Ay” 1 ]- 1 = y _1 [/ + As -1 + A 2 s~ 2 + ■ ■ -J 

<P(t) = ir'{\sl - A]-'} (6.5.6) 

A 2 A 3 

= I + At+ —t 2 + —t 3 -\ - = e Al with 0(0) = I 

By applying the convolution property of Laplace transform (Table D.2(4) in 
Appendix D) 

l.- ] {\sl - A] -1 BU (.v)} = L^{[sl - A]- 1 } * I.-'{BU(s)} = 0(f) * Bu(t) 

= f 0(f — t)Bm(t) dT Uir) ~° for = <0 or x>t f (j)(t — x)Bu{x)dx (6.5.7) 
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we can take the inverse Laplace transform of Eq. (6.5.5) to write 

x(f) = <j>(t)x(0) + <p(t ) * Bu(t) = <Kt)x( 0) + f cp(t- dr (6.5.8) 

Jo 

For Eq. (6.5.3), we use Eq. (6.5.6) to find 
<Pit) = L~ l {[sl - AT 1 } 

-ft'AM& .t]) 

-l -if 1 p + 1 ill 
W + dL 0 s \\ 


-it 


1 /s l/.v- l/(s + I) 


])=[£ < 6 - 5 - 9 


0 l/(s+l) 

and use Eqs. (6.5.8), (6.5.9), and w(f) = u s it) = 1 V t > 0 to obtain 

•«-[:]*n: 

Alternatively, we can directly take the inverse transform of Eq. (6.5.5) to get 
X(s) = [si - A] _1 {x(0) + [si - A] _1 BC7( 5 )} 

1 [ —L. ] - 3sVl7 [.li L L] <« = 1 '> 

X \it) = t — l + 2e~' (6.5.12a) 

, x 2 (t) = 1 - 1e~ l (6.5.12b) 


Xiis) = 
X 2 is ) = 


s 2 is + 1 ) 
s 2 +l 


1 1 2 

is + 1 ) “ s* ~ 7 + 7+1 


1 -s _ 1 

sis + 1 ) s s + 1 


which conforms with Eq. (6.5.10). 

The MATLAB program “nm651_2.m” uses a symbolic computation routine 
“ilaplaceO” to get the inverse Laplace transform, uses “eval()” to evaluate 
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Figure 6.5 Numerical/analytical solutions of the continuous-time state equation (6.5.2)/(6.5.3). 


it, and plots the result as depicted in Fig. 6.5, which supports this derivation pro¬ 
cedure. Additionally, it uses another symbolic computation routine “dsolveO” 
to get the analytical solution directly. 

»nm651_2 

Solution of Differential Equation based on Laplace transform 
xs = [ 1/s + 1/s/(s + 1)*(-1 + 1/s) ] 

[ l/(s + 1 )*(-1 + 1/s) ] 

xt = [ -1 + t + 2*exp(-t) ] 

[ -2*exp(-t) + 1 ] 

Analytical solution 

xt1 = -1 + t + 2*exp(-t) 

xt2 = -2*exp(-t) + 1 


%nm651_2: Analytical solution for state eq. x'(t) = Ax(t) + Bu(t)(6.5.3) 
clear 

syms s t %declare s,t as symbolic variables 
A = [0 1;0 -1]; B = [0 1] 1 ; %Eq.(6.5.3) 
xO = [1 -1]'; %initial value 

disp('Solution of Differential Eq based on Laplace transform 1 ) 
disp('Laplace transformed solution X(s) 1 ) 

Xs = (s*eye(size(A)) - AK-1*(xO + B/s) %Eq. (6.5.5) 

disp( 1 Inverse Laplace transformed solution x(t) 1 ) 

xt = ilaplace(Xs) %inverse Laplace transform %Eq.(6.5.12) 

to = 0; tf = 2; N = 45; %initial/final time 

t = to + [0:N] 1 *(tf - to)/N; %time vector 

xtt = eval(xt:); %evaluate the inverse Laplace transform 

plot(t,xtt) 

disp('Analytical solution') 

xt = dsolve('Dxl = x2, Dx2 = -x2 + 1', 'xl(O) = 1, x2(0) = -1'); 
xtl = xt.xl, xt2 = xt.x2 %Eq.(6.5.10) 
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6.5.2 Discretization of LTI State Equation 

In this section, we consider a discretization method of converting a continuous¬ 
time LTI (linear time-invariant) state equation 

x'(t) = Ax(t) + Bu{t) with the initial state x(0) and the input u(t) (6.5.13) 
into an equivalent discrete-time LTI state equation with the sampling period T 
x[n + 1] = A d x[n] + B d u[n] (6.5.14) 

with the initial state x[0] and the input u[n\ = u(nT ) for nT < t < (n + 1 )T 


which can be solved easily by an iterative scheme mobilizing just simple multi¬ 
plications and additions. 

For this purpose, we rewrite the solution (6.5.8) of the continuous-time LTI 
state equation with the initial time to as 

x(t) = <p(t- t 0 )x(t 0 ) + f 4>(t-x)Bu(x)dx (6.5.15) 


Under the assumption that the input is constant as the initial value within each 
sampling interval—that is, u[n] = u(nT) for nT < t < (n + \)T —we substitute 
t 0 = nT and t = (n + l)T into this equation to write the discrete-time LTI state 
equation as 


x((n + l)T) = 4)(T)x(nT) + 


r 

r(n+K 

t- / 

J nT 


x[n + 1] = <£(?>[«] + J 4>{nT + T- r) 
x[n + 1] = A d x[n\ + B d u[n] 


x)Bu(nT)dr 
drBu[n ] 

(6.5.16) 


where the discretized system matrices are 
A d = 4>{T) = e AT 


/ (n+l)T /»0 pT 

4>(jiT + T - x) dxB° =nT = T - r - / 4>{cr) daB = / 4>(x) dxB 
t Jt Jo 

(6.5.17b) 

Here, let us consider another way of computing these system matrices, which 
is to the taste of digital computers. It comes from making use of the definition 
of a matrix exponential function in Eq. (6.5.6) to rewrite Eq. (6.5.17) as 


A d — 

B d 


^ A m r rm ^ \ m r rm 

= E — = ' + AT T,^ TT y = > + *™ 

m=0 m=0 v ’ 

f T r T A m X m J^njm+\ 

= <p(x) dxB = / V- —dxB=y - —B = q>TB (6.5.18b) 

Jo Jo ml + W 
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where 


A m T m 

(m + 1)! 


AT \ AT f AT ( AT\} ] 

^TT-)/ + -{/ + ... + — + for»,l 

(6.5.19) 

Now, we apply these discretization formulas for the continuous-time state 
equation (6.5.3) 


: 'i(oi_ro 1 1 rjci(r)~| r°i , 

[o -ij[* 2 (f)J Lu 


pi(0l 

_ 

U(oJ 

~ 

r *i(o)i 


|_*2(0)J 

rl 


to get the discretized system matrices and the discretized state equation a 
m = L~ 1 {[sI -A]- 1 ] 


Ip -i ■ 

(6.5.9) 1" 1 

1 — e 1 

|L° * +1. 

j ~ L° 

e~ f 


A d 


(6.5T.7a) 


0(7) (6 ' 5 = 20a > 
cj)(r)dxB 


[1 1 — e~ T 1 

.o 


(6.5.20a) 

(6.5.20b) 


:6 ' 5 i 7b) f\ 


x[n + 1] 


A d x[n] + B d u[n ] 


"fijn-rj ( 6 . 5 . 21 ) 

_x 2 [n-1 -1] J L° e JL x 2 [«]J L 


We don’t need any special algorithm other than an iterative scheme to solve 
this discrete-time state equation. The formulas (6.5.18a,b) for computing the 
discretized system matrices are cast into the routine “c2d_steq( )”. The pro¬ 
gram “nm652.m” discretizes the continuous-time state equation (6.5.3) by using 
the routine and alternatively, the MATLAB built-in routine “c2d()”. It solves 
the discretized state equation and plots the results as in Fig. 6.6. As long as 
the assumption that u[n] = u(nT) for nT < t < (n + 1)7 is valid, the solution 
( x[n ]) of the discretized state equation is expected to match that (x(t)) of 
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Figure 6.6 The solution of the discretized state equation (6.5.21). 


the continuous-time state equation at every sampling instant t = nT and also 
becomes closer to x(t)V t as the sampling interval T gets shorter (see Fig. 6.6). 


%nm652.m 
% discretize a 
clear, elf 


e eqn x 1 (t) = Ax(t) 


Bu(t) 


x[n + 


d*x[n] + Bd*u[n] 


1 ;0 -1 ]; B = [0;1]; %Eq.(6.5.3) 

-1]; to = 0; tf = 2; %initial value and time span 
; %sampling interval(period) 
eT = exp(-T); 

1 - eT; 0 eT]%discretized system matrices obtained analytically 
t eT - 1; 1 - eT] %Eq.(6.5.21) 

[Ad,Bd] = c2d_steq(A,B,T,100) %continuous-to-discrete conversion 
[Adl.Bdl] = c2d(A,B,T) %by the built-in routine 

= 0; xd(1,:) = xO; %initial time and initial value 
k = 1:(tf - to)/T %solve the discretized state equation 
(k + 1) = k*T; xd(k + 1,:) = xd(k,:)*Ad' + Bd 1 ; 


stairs([0; t'],[xO; xd]), hold on %stairstep graph 

N = 100; t = to + [0:N] 1 *(tf - tO)/N; %time (column) vector 

x(:,1) = t-1 + 2*exp(-t); %analytical solution 

x(: ,2) = 1 -2*exp(-t); %Eq.(6.5-12) 

plot(t.x) 


function [Ad,Bd] = c2d_steq(A,B,T,N) 
if nargin < 4, N = 100; end 
I = eye(size(A,2)); PSI = I; 

for m = N:-1:1, PSI = I + A*PSI*T/(m + 1); end %Eq.(6.5.19) 
Ad = I + A*PSI*T; Bd = PSI*T‘B; %Eq.(6.5.18) 


6.5.3 High-Order Differential Equation to State Equation 

Suppose we are given an ATh-order scalar differential equation together with the 
initial values of the variable and its derivatives of up to order N — 1, which is 
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called an IVP (Initial Value Problem): 


[IVPJiv : * W (0 = f(t, x(t), x\t), x (2 \t), x ,JV - n (t)) 
with the initial values x(t 0 ) = x w , x'(t 0 ) = x 2 o, ■ ■ ■, x <N ~ i} (to) = x N0 


(6.5.22) 


Defining the state vector and the initial state as 



- X\= X 


*10 


x 2 = x' 


*20 

x(0 = 

* 3 = * (2) 

, x(t 0 ) = 

*30 


_X N =X^_ 


_x N o_ 


(6.5.23) 


we can rewrite Eq. (6.5.22) in the form of a first-order vector differential 
equation—that is, a state equation—as 


'mr 


r *2(0 1 

x 2 it) 


* 3(0 

Ait ) 

= 

* 4(0 

_x' N (t) _ 


_ fit, xit ), x\t), x< 2 >(0,..., * (A, - 1} (0). 


x'(t) = f {t, x(f)) with x(r 0 ) = x 0 (6.5.24) 


For example, we can convert a third-order scalar differential equation 
* (3) (0 + a 2 x°-\t) + ayx'it) + a 0 xit ) = u(f) 
into a state equation of the form 

1 0 " 


~x[it)~ 


* 2(0 

= 

.* 3 ( 0 . 

. 


0 1 
0 0 1 
—ao —a\ —a 2 

^x\it )" 

X 2 it) 
* 3(0 


xit) = [l 0 0] 


*1.(0 


'0" 

*2(0 

+ 

0 

*3(0 


1 


m( 0 (6.5.25a) 


(6.5.25b) 


6.5.4 Stiff Equation 

Suppose that we are given a vector differential equation involving more than one 
dependent variable with respect to the independent variable t. If the magnitudes 
of the derivatives of the dependent variables with respect to t (corresponding 






VECTOR DIFFERENTIAL EQUATIONS 285 


to their changing rates) are significantly different, such a differential equation is 
said to be stiff because it is difficult to be solved numerically. For such a stiff 
differential equation, we should be very careful in choosing the step-size in order 
to avoid numerical instability problem and get a reasonably accurate solution 
within a reasonable computation time. Why? Because we should use a small 
step-size to grasp rapidly changing variables, and it requires a lot of computation 
to cover slowly changing variables for such a long time as it lasts. 

Actually, there is no clear distinction between stiff and non-stiff differential 
equations, since stiffness of a differential equation is a matter of degree. Then, is 
there any way to estimate the degree of stiffness for a given differential equation? 
The answer is yes, if the differential equation can be arranged into an LTI state 
equation like Eq. (6.5.4), the solution of which consists of components having 
the time constants (modes) equal to the eigenvalues of the system matrix A. For 
example, the system matrix of Eq. (6.5.3) has the eigenvalues 

|sJ — A| = 0, det J q =*(* + i) = °, 5 = 0 and 5 = — 1 

which can be observed as the time constants of two terms 1 = e 0t and e~' in 
the solution (6.5.12). In this context, a measure of stiffness is the ratio of the 
maximum over the minimum among the absolute values of (negative) real parts 
of the eigenvalues of the system matrix A: 


Max{|Re(A.,)|} 
Min{|Re(A.,)| ^ 0} 


(6.5.26) 


This can be thought of as the degree of unbalance between the fast mode and 
the slow mode. 

Now, what we must know is how to handle stiff differential equations. For¬ 
tunately, MATLAB has several built-in routines like “ode15s()”, “ode23s()”, 
“ode23t()”, and “ode23tb()”, which are fabricated to deal with stiff differen¬ 
tial equations efficiently. One may use the help command to see their detailed 
usages. Let’s apply them for a Van der Pol equation 


~ M,(l — y 2 (t))—j— + y(t) = 0 with y(0) = 
dt z at 

which can be written in the form of a state equation as 


1 ,^ = 0 
dt 

(6.5.27a) 


v l( oir *2(0 

.* 2(0 J Ua-*?(0)*2(o-*i(o 


- * 1 ( 0)1 
_* 2 ( 0 ) 


' _ \ 2 ~ 
- L° 


For this job, we defined this equation in an M-file named “df_van .m” and made 
the MATLAB program “nm654. m”, where we declared the parameter /1 (mu) as 
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Ham() with N= 8700 



40 
20 
0 

-20 
-40 

0 50 100 

(c) Result obtained by ode45() 



H = 200 

Since the range of x^t) is 
much smaller than that of 
x 2 (t), Xi(t) is invisibly 
dwarfed by x 2 (f). 



(d) Results obtained by ode23(), ode23s(), 
ode23t(), ode23tb() and ode15s() 


Figure 6.7 Numerical solutions of Van der Pol equation obtained by various routines. 


a global variable so that it could be passed on to any related routines/functions 
as well as “df_van.m”. In the beginning of the program, we set the global 
parameter /x to 25 and applied “ode_Ham()” with the number of segments 
N = 8700 and 9000. The results are depicted in Figs. 6.7a and 6.7b, which 
show how crucial the choice of step-size is for a stiff equation. Next, we 
applied “ode45()” to obtain the solution depicted in Fig. 6.7c, which is almost 
the same as Fig. 6.7b, but with the computation time less than one fourth 
of that taken by “ode_Ham( )”. This reveals the merit of the MATLAB built- 
in routines that may save the computation time as well as spare our trouble 
to choose the step-size, because the step-size is adaptively determined inside 
the routines. Then, setting /x = 200, we applied the MATLAB built-in routines 
“ode45()”/“ode23()”/“ode15s()”/“ode23s()”/“ode23t ()”/“ode23tb( )” to get 
the results that are little different as depicted in Fig. 6.7d, each taking the 
computation time as 

time = 24.9530 14.9690 0.1880 0.2650 0.2500 0.2820 

The computation time-efficiency of “ode15s()’7“ode23s()’7“ode23t()”/ 
“ode23tb ()” (designed deliberately for handling stiff differential equations) over 
“ode45( )”/“ode23( )” becomes prominent as the value of parameter /x (mu) gets 
large, reflecting high stiffness. 
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%nm654.m 

% to solve a stiff differential eqn called Van der Pol equation 
global mu 

mu=25, t0=0; tf = 100; tspan = [to tf]; xo = [20]; 

[tH1,xH1] = ode_Ham( 1 df_vantspan,xO,8700); 
subplot(221), plot(tHI,xHI) 

tic,[tH2,xH2] = ode_Ham('df_van',tspan,xO,9000); time_Ham = toe 
tic,[t45,x45] = ode45('df_van',tspan,xO); time_o45 = toe 
Subplot(222), plot(tH2,xH2), subplot(223), plot(t45,X45) 
mu = 200; tf = 200; tspan = [to tf]; 

tic,[t45,x45] = ode45('df_van 1 ,tspan,xO); time(1) = toe; 

tic,[t23,x23] = ode23('df_van',tspan,xO); time(2) = toe; 

tic,[t15s,x15s] = ode15s('df_van 1 ,tspan,xO); time(3) = toe; 

tic,[t23s,x23s] = ode23s('df_van 1 ,tspan,xO); time(4) = toe; 

tic,[t23t,x23t] = ode23t('df_van',tspan,xO); time(5) = toe; 

tic,[t23tb,x23tb] = ode23tb( 1 df_van 1 ,tspan,xO); time(6) = toe; 

plot(t45,x45, t23,x23, t15s,x15s, t23s,x23s, t23t,x23t, t23tb,x23tb) 

dispC ode23 odel5s ode23s ode23t ode23tb') 


function dx = df_van(t,x) 

%Van der Pol differential equation (6.5.27) 
global mu 
dx=zeros(size(x)); 

dx(1) = x(2); dx(2) = mu*(1-x(1).*2).*x(2) - x(1); 


6.6 BOUNDARY VALUE PROBLEM (BVP) 

A boundary value problem (BVP) is an Vth-order differential equation with some 
of the values of dependent variable x(t) and its derivative specified at the initial 
time to and others specified at the final time tf. 

[BVPk : x m (t) = f{t, x(t), x'it), x°-\t), .. ., x (JV - 1} (f)) 

with the boundary values x{ti) = x w , x'(t 2 ) = x 2 \,..., x (N ~ V) {t N ) = x N 

( 6 . 6 . 1 ) 

In some cases, some relations between the initial values and the final values may 
be given as a mixed-boundary condition instead of the initial/final values spec¬ 
ified. This section covers the shooting method and the finite difference method 
that can be used to solve a second-order BVP as 

[BVP] 2 : x"{t) = fit, xit), x\t)) with xit 0 ) = x 0 , xit f ) =x f (6.6.2) 

6.6.1 Shooting Method 

The idea of this method is to assume the value of x'ito), then solve the differential 
equation (IVP) with the initial condition [%(f 0 ) v'(fo)J and keep adjusting the value 
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of x'(t 0 ) and solving the IVP repetitively until the final value x(tf) of the solution 
matches the given boundary value Xf with enough accuracy. It is similar to 
adjusting the angle of firing a cannon so that the shell will eventually hit the target 
and that’s why this method is named the shooting method. This can be viewed 
as a nonlinear equation problem, if we regard x'(t 0 ) as an independent variable 
and the difference between the resulting final value x(tf) and the desired one x/ 
as a (mismatching) function of x'(to). So the solution scheme can be systemized 
by using the secant method (Section 4.5) and is cast into the MATLAB routine 
“bvp2_shoot()”. 

(cf) We might have to adjust the shooting position with the angle fixed, instead of adjust¬ 
ing the shooting angle with the position fixed or deal with the mixed-boundary 
conditions. See Problems 6.6, 6.7, and 6.8. 

For example, let’s consider a BVP consisting of the second-order differential 
equation 


x"(t) = 2x 2 (t) + 4t x(t)x'(t) 


(6.6.3) 


function [t,x] = bvp2_shoot(f,tO,tf,xO,xf,N,tol,kmax) 

%To solve BVP2: [xl,x2] 1 = f(t,x1,x2) with xl(tO) = xO, xl(tf) = xf 
if nargin < 8, kmax = 10; end 
if nargin < 7, tol = 1e-8; end 
if nargin < 6, N = 100; end 

dx0(1) = (xf - xO)/(tf-tO); % the initial guess of x'(tO) 

[t,x] = ode_RK4(f,[tO tf],[xO dx0(1)],N); % start up with RK4 
plot(t,x(:,1)), hold on 

e(1) = x(end,1) - xf; % x(tf) - xf: the 1st mismatching (deviation) 
dx0(2) = dx0(1) - 0.1*sign(e(1)); 
for k = 2: kmax-1 

[t,x] = ode_RK4(f,[tO tf],[xO dxO(k)],N); 

Plot(t,x(:,1)) 

%difference between the resulting final value and the target one 
e(k) = x(end,1) - xf; % x(tf)- xf 

ddx = dxO(k) - dx0(k - 1); % difference between successive derivatives 
if abs(e(k))< tol | abs(ddx)< tol, break; end 

deddx = (e(k) - e(k - 1))/ddx; % the gradient of mismatching error 
dx0(k + 1) = dxO(k) - e(k)/deddx; %move by secant method 


%do_shoot to solve BVP2 by the shooting method 

tO = 0; tf = 1; xO = 1/4; xf = 1/3; %initial/final times and positions 
N = 100; tol = 1e-8; kmax = 10; 

[t,x] = bvp2_shoot('df661 1 ,tO,tf,xO,xf,N,tol,kmax); 
xo = 1-/(4 - t.*t); err = norm(x(:,1) - xo)/(N + 1) 
plot(t,x(:,1), 1 b', t,xo,'r') %compare with true solution (6.6.4) 


function dx = df661(t,x) %Eq.(6.6.5) 

dx(1) = x(2); dx(2) = (2*x(1) + 4*t*x(2))*x(1); 
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The solution x(t ) and its derivative x'(t) are known as 

x { t ) = ^— 2 and x\t) = (4 ^ 2)2 = 2 1 x 2 (t) (6.6.4) 

Note that this second-order differential equation can be written in the form of 
state equation as 


K«i r X 2 (t) | 

lx' 2 (t)\ [ 2xf it) + 4t X! (t)X 2 ( t) \ 

In order to apply the shooting method, 
x'(0) to 



(6.6.5) 

we set the initial guess of x 2 (0) = 


dxO[l] = * 2 (0) = ——— (6.6.6) 

if —1 0 


and solve the state equation with the initial condition [*i(0) *2(0) = dx0[l JJ. 
Then, depending on the sign of the difference e ( 1 ) between the final value *i(l) 
of the solution and the target final value %/, we make the next guess dx 0[2] 
larger/smaller than the initial guess dx 0[1] and solve the state equation again 
with the initial condition |a'i (0) dxQ[2\\. We can start up the secant method 
with the two initial values djc0[l] and dx 0[2] and repeat the iteration until the 
difference (error) e(k) becomes sufficiently small. For this job, we compose 
the MATLAB program “do_shoot.m”, which uses the routine “bvp2_shoot()” 
to get the numerical solution and compares it with the true analytical solution. 
Figure 6.8 shows that the numerical solution gets closer to the true analytical 
solution after each round of adjustment. 


(Q) Why don’t we use the Newton method (Section 4.4)? 

(A) Because, in order to use the Newton method in the shooting method, we need IVP 
solutions instead of function evaluations to find the numerical Jacobian at every 
iteration, which will require much longer computation time. 



Figure 6.8 The solution of a BVP obtained by using the shooting method. 
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6.6.2 Finite Difference Method 

The idea of this method is to divide the whole interval [to, t/] into N segments 
of width h = (tf — t 0 )/N and approximate the first & second derivatives in the 
differential equations for each grid point by the central difference formulas. This 
leads to a tridiagonal system of equations with respect to (N — 1) variables {*,• = 
x(t 0 + ih),i = 1,..., N — 1}. However, in order for this system of equations to 
be solved easily, it should be linear, implying that its coefficients may not contain 
any term of x. 

For example, let’s consider a BVP consisting of the second-order linear dif¬ 
ferential equation 

x"{t) + ai(t)x'(t) + ao(t)x(t) = u(t ) with x(t 0 ) = x 0 , x(t /) = x f (6.6.7) 


According to the finite difference method, we divide the solution interval 
[to, tf] into N segments and convert the differential equation for each grid point 
/, = to + ih into a difference equation as 


(2 — ha.ii)Xi—i + (—4 + 'lh^ciQi)Xi + (2 + ha.\i)Xi+\ — 2 h^Uj 


( 6 . 6 . 8 ) 


Then, taking account of the boundary condition that xq = x(to) and Xn = x(tf), 
we collect all of the (N — 1) equations to construct a tridiagonal system of 
equations as 


—4 + 2/i 2 aoi 2 +/jan 0 • 0 0 

2 — han —4 + 2/? 2 ao2 2 + han • 0 0 

0 2 — hai3 —4 + 2/j 2 ao3 • 0 0 


0 

0 

0 


0 

0 

0 


0 

0 

0 


—4 + 2/j 2 ao,;v-3 2 + hai'N-3 0 

2 — /iai,;v-2 —4 + 2h 2 ao,N-2 2 + /jai,iv-2 

0 2 — hai'N-i —4 + 2h 2 ao,N-i 


xi 

X2 

X2 


2/i 2 «i — (2 — han)xo 
2h 2 u 2 
2h 2 u 3 


xn -3 



2h 2 U N -3 

2h 2 u N —2 

2h 2 u N -i - (2 — hai tN -\)xN 


( 6 . 6 . 9 ) 


This can be solved efficiently by using the MATLAB routine “trid()”, which 
is dedicated to a tridiagonal system of linear equations. 
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The whole procedure of the finite difference method for solving a second-order 
linear differential equation with boundary conditions is cast into the MATLAB 
routine “bvp2_fdf ()”. This routine is designed to accept the two coefficients 
a\ and no and the right-hand-side input u of Eq. (6.6.7) as its first three input 
arguments, where any of those three input arguments can be given as the function 
name in case the corresponding term is not a numeric value, but a function 
of time t. We make the program “do_fdf” to use this routine for solving the 
second-order BVP 

x"(t) + -x\t) - = 0 with x(l) = 5, x(2) = 3 (6.6.10) 


function [t,x] = bvp2_fdf(al,a0,u,t0,tf,x0,xf,N) 

% solve BVP2: x" + a1*x' + a0*x = u with x(t0) = xO, x(tf) = xf 
% by the finite difference method 
h = (tf - tO)/N; h2 = 2*h*h; 
t = t0+[0:N] 1 *h; 

if -isnumericfal), al = al(t(2:N)); %if al = name of a function of t 
elseif length(al) == 1, al = a1*ones(N - 1,1); 

if -isnumeric(aO), aO = aO(t(2:N)); %if aO = name of a function of t 
elseif length(aO) == 1, aO = aO*ones(N - 1,1); 

if -isnumeric(u), u = u(t(2:N)); %if u = name of a function of t 
elseif length(u) == 1, u = u*ones(N-1,1); 
else u = u(:); 

A = zeros(N - 1,N - 1); b = h2*u; 

ha = h*a1(1); A(1,1:2) = [-4 + h2*a0(1) 2 + ha]; 

b(1) = b(1)+(ha - 2)*x0; 

for m = 2:N - 2 %Eq.(6.6.9) 

ha = h*a1(m); A(m,m - 1:m + 1) = [2-ha -4 + h2*a0(m) 2 + ha]; 

ha = h*a1(N - 1); A(N - 1,N - 2:N - 1) = [2 - ha -4 + h2*aO(N - 1)]; 
b(N - 1) = b(N-1)-(ha+2)*xf; 
x = [xO trid(A,b) 1 xf]'; 


function x = trid(A,b) 

% solve tridiagonal system of equations 
N = size(A,2); 

for m = 2:N % Upper Triangularization 
tmp = A(m,m - 1)/A(m - 1,m - 1); 

A(m,m) = A(m,m) -A(m - 1,m)*tmp; A(m,m - 1) = 0; 
b(m,:) = b(m,:) -b(m - 1,:)*tmp; 

x(N,:) = b(N,:)/A(N,N); 

for m = N - 1: -1: 1 % Back Substitution 

x(m,:) = (b(m,:) -A(m,m + 1)*x(m + 1))/A(m,m); 






292 ORDINARY DIFFERENTIAL EQUATIONS 


%do_fdf to solve BVP2 by the finite difference method 
clear, elf 

tO = 1; xO = 5; tf = 2; xf = 3; N = 100; 

al = inline) 1 2./t 1 , 1 t 1 ); aO = inline)'-2./t./tt'); u = 0; %Eq.(6.6.10) 
[tt,x] = bvp2_fdf(al,aO,u,tO,tf,xO,xf,N); 

%use the MATLAB built-in command 'bvp4c()' 

df = inline)'[x(2); 2./t.*(x(1)./t - x(2))]','t 1 ,'x'); 

fbc = inline)'[xO(1) - 5; xf(1) - 3] 1 , 1 xO', 1 xf 1 ); 

solinit = bvpinit(linspace(tO,tf,5),[1 10]); %initial solution interval 
sol = bvp4c(df,fbc,solinit,bvpset)'RelTol',1e-4)); 
x_bvp = deval(sol,tt); xbv = x_bvp(1,:) 1 ; 

%use the symbolic computation command 'dsolve))' 

xo = dSOlve( 1 D2x + 2*(Dx - x/t)/t=0','x(1) = 5, x(2) = 3') 

xot = subs(xo,'t 1 ,tt); %xot=4./tt./tt +tt; %true analytical solution 

err_fd = norm(x - xot)/(N+1) %error between numerical/analytical solution 

err_bvp = norm(xbv - xot)/(N + 1) 

plot(tt,x,'b 1 ,tt,xbv,'r',tt,xot, 1 k') %compare with analytical solution 


We ran it to get the result depicted in Fig. 6.9 and, additionally, use the 
symbolic computation command “dsolve))” and “subs))” to get the analytical 
solution 

x(t) = t + ^ (6.6.11) 

and substitute the time vector into the analytical solution to obtain its numeric 
values for check. 

Note the following things about the shooting method and the finite differ¬ 
ence method: 

• While the shooting method is applicable to linear/nonlinear BVPs, the finite 
difference method is suitable for linear BVPs. However, we can also apply 
the finite difference method in an iterative manner to solve nonlinear BVPs 
(see Problem 6.10). 



Figure 6.9 A solution of a BVP obtained by using the finite difference method. 
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• Both methods can be modified to solve BVPs with mixed-boundary condi¬ 
tions (see Problems 6.7 and 6.8). 

• In MATLAB 6.x, the “bvp4c()” command is available for solv¬ 
ing linear/nonlinear BVPs with mixed-boundary conditions (see Prob¬ 
lems 6.7-6.10). 

• The symbolic computation command “dsolveO” introduced in Section 
6.5.1 can be used to solve a BVP so long as the differential equation is lin¬ 
ear, that is, its coefficients may depend on time t, but not on the (unknown) 
dependent variable x(t). 

• The usages of “bvp4c()” and “dsolveO” are illustrated in the program 
“do_fdf”, where another symbolic computation command “subs( )” is used 
to evaluate a symbolic expression at certain value(s) of the variable. 

PROBLEMS 

6.0 MATLAB Commands quiver () and quiver3() and Differential Equation 
(a) Usage of quiver () 

Type ‘help quiver’ into the MATLAB command window, and then you 
will see the following program showing you how to use the quiver () 
command for plotting gradient vectors. You can also get Fig. P6.0.1 by 
running the block of statements in the box below. Try it and note that 
the size of the gradient vector at each point is proportional to the slope at 
the point. 


%do_quiver 

[x,y] = meshgrid(-2:.5:2,-1:.25:1); 
z = x.*exp(-x."2 - y."2); 

[px,py] = gradients, .5, .25); 
contour(x,y,z), hold on, quiver(x,y,px,py) 
axis image %the same as AXIS EQUAL except that 

%the plot box fits tightly around the data 



Figure P6.0.1 Graphs obtained by using gradient(), contour(), quiver(). 
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-1 -2 

Figure P6.0.2 Graphs obtained by using surf norm (), quiver3(), surf (). 


(b) Usage of quiver3() 

You can obtain Fig. P6.0.2 by running the block of statements that you 
see after typing ‘help quiver3’ into the MATLAB command win¬ 
dow. Note that the “surfnorm()” command generates normal vectors 
at points specified by (x, y, z) on the surface drawn by “surf ()” and 
the “quiver3()” command plots the normal vectors. 


%do quiver3 


clear, elf 


[x,y] = meshgridf-2:.5:2,- 

:.25:1); 

z = x.*exp(-x."2 - y."2); 


surf(x,y,z), hold on 


[u,v,w] = surfnorm(x,y,z); 


quiver3(x,y,z,u,v,w); 



(c) Gradient Vectors and One-Variable Differential Equation 

We might get the meaning of the solution of a differential equation 
by using the “quiver()” command, which is used in the following 
program “do_ode. m” for drawing the time derivatives at grid points as 
defined by the differential equation 

—- = — y(t) + 1 with the initial condition y(0) = 0 (P6.0.1) 

dt 

The slope/direction field together with the numerical solution in 
Fig. P6.0.3a is obtained by running the program and it can be regarded 
as a set of possible solution curve segments. Starting from the initial 
point and moving along the slope vectors, you can get the solution 
curve. Modify the program and run it to plot the slope/direction 
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^ ^ ^ ^ 


(a) The graph of ay vs. dffor 

y'(t)= -y(0 +1 

and its solution to 



(b) The graph of dx 2 vs. dx , 


.fer/■*.'(« = ^(» 

1 W< f )=-*2< 


Figure P6.0.3 Possible solutions of differential equation and slope/direction field. 


field (X 2 (t) versus X\(t j) and the numerical solution for the following 
differential equation as depicted in Fig. P6.0.3b. 


x\(t) = x 2 (t) 
x' 2 (t) = —X 2 (t) + 1 


r*t(o)i r i' 

1 [_ X2 (°) J L- 1 . 


r i.5' 

1 |_ —°.5 _ 


(P6.0.2) 


%do_ode.m 

% This uses quiver() to plot possible solution curve segments 
% called the slope/directional field for y'(t) + y = 1 
clear, elf 

tO = 0; tf = 2; tspan = [to tf]; xO = 0; 

[t,y] = meshgrid(tO:(tf - t0)/10:tf,0:.1:1); 

pt = ones(size(t)); py = (1 - y).*pt; %dy = (1 - y)dt 

quiver(t,y,pt,py) %y(displacement) vs. t(time) 

axis([to tf + .2 0 1.05]), hold on 

dy=inline('-y + 1', 1 1', 1 y'); 

[tR,yR] = ode_RK4(dy,tspan,xO,40); 

for k = 1:length(tR), plot(tR(k),yR(k), 1 rx'), pause(0.001); end 


6.1 A System of Linear Time-Invariant Differential Equations: An LTI State 
Equation 

Consider the following state equation: 



(P6.1.1) 

(a) Check the procedure and the result of obtaining the analytical solution 
by using the Laplace transform technique. 
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X(s) 


Xi (s) 
X 2 (s) 


[si -Ar\x(0) + BU(s)} 


s(s + 3) + 2 L 

-2 aJlLoJ 

1 

r.v+3+iAi 

(s+l)(s + 2) 

L -2+1 J 

> 2 + 35+l)A(5 + l)(i + 2)' 
-l/(s + l)(s + 2) 

1/2 1 
s s + 1 

s + 2 

-1 1 

s + 1 s + 2 

x 2 (t) = 


(P6.1.2a) 

(P6.1.2b) 


(b) Find the numerical solution of the above state equation by using the 
routine “ode_RK4()” (with the number of segments N = 50) and the 
MATLAB built-in routine “ode45()”. Compare their execution time 
(by using tic and toe) and closeness to the analytical solution. 

6.2 A Second-Order Linear Time-Invariant Differential Equation 
Consider the following second-order differential equation 


x"(t) + 3x'(t) + 2x (t) = 1 with jc(0) = 1, +(0) = 0 (P6.2.1) 


(a) Check the procedure and the result of obtaining the analytical solution 
by using the Laplace transform technique. 


s*X(s) - x'(0) - sx( 0) + 3(sX(s) - *(0)) + 2 X(s) = - 

s 

s 2 + 3s + l 


x(t) = - + e - { - —e~ 2t (P6.2.2) 


(b) Define the differential equation (P6.2.1) in an M-file so that it can be 
passed to the MATLAB routines like “ode_RK4()” or “ode45()” as 
their input argument (see Section 6.5.1). 


6.3 Ordinary Differential Equation and State Equation 
(a) Van der Pol Equation 

Consider a nonlinear differential equation 


^y(t) - m(1 - y 2 (t))^-y(t) + y(r) ~ 0 with/r = 2 (P6.3.1) 

at- at 

Compose a program to solve this equation with the initial condition 
[y(0) y'(0)J = [0.5 0] and [-1 2] for the time interval [0, 20] and plot 
y\t) versus y(t) as well as y(t) and y’(t) along the t-axis. 
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(b) Lorenz Equation: Turbulent Flow and Chaos 
Consider a nonlinear state equation. 

x[(t) = cr(x 2 (t) - Xi(t)) a = 10 

x' 2 (t) = (1 + X - x 3 (t)) Xl (t) - x 2 (t) with A = 20 -100 (P6.3.2) 

x' i (t\M:X\(t)x 1 (t) - yx 3 (t) y = 2 

Compose a program to solve this equation with X = 20 and 100 for the 
time interval [0,10] and plot x 3 (t) versus x\(t). Let the initial condition 
be [jci (0) * 2 (0) * 3 (0)J = [-8 -16 80], 

(c) Chemical Reactor 

Consider a nonlinear state equation describing the concentrations of 
two reactants and one product in the chemical process. 


x[(t) = a(u\ -.ri(f)) - bxi (0*2(0 a = 5 

*'(0 =a(u 2 - jc 2 (0) -bxi(t)x 2 (t) with b = 2 (P6.3.3) 

- 0(0 = — 0 x 3 ( 1 ) + bx l (t)x 2 (0 ui = 3, u 2 = 5 


Compose a program to solve this equation for the time interval 
[0, 1] and plot *i(f), x 2 (t), and x 3 (t). Let the initial condition be 
[*i(0) * 2 (0) * 3 (0)] = [1 2 3], 

(d) Cantilever Beam: A Differential Equation w.r.t a Spatial Variable 

Consider a nonlinear state equation describing the vertical deflection of 
a beam due to its own weight 



(P6.3.4) 


where JE = 2000 kg • m 3 /s 2 , p = 10 kg/m, g = 9.8 m/s 2 , L = 2m. 
Write a program to solve this equation for the interval [0, L] and plot 
y(t). Let the initial condition be [^(0) /(0)] = [0 0], Note that the 
physical meaning of the independent variable for which we usually use 
the symbol ‘t’ in writing the differential function is not a time, but 
the *-coordinate of the cantilever beam along the horizontal axis in 
this problem. 

(e) Phase-Locked Loop (PLL) 

Consider a nonlinear state equation describing the behavior of a PLL 
circuit depicted in Fig. P6.3.1. 


x[(t) = 


au(t) cos(* 2 (Q) -*i(Q 


a = 1500 

with r = 0.002 (P6.3.5a) 

u(t) = sin(tt> 0 f) 


*2 (t) = *1 (t) + &> c 
y(t) = *i(r) + co c 


(P6.3.5b) 
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Figure P6.3.1 The block diagram of PLL circuit. 


where co 0 = 2100jt |rad/sj and co c = 2000 tt [rad/s]. Compose a pro¬ 
gram to solve this equation for the time interval [0,0.03] and plot y(f) 
and co 0 . Let the initial condition be [xi(0) *2(0)] = [0 0], Is the output 
y(t) tracking the frequency co 0 of the input u{t)l 
(f) DC Motor 

Consider a linear differential equation describing the behavior of a DC 
motor system (Fig. P6.3.2) 


./ 

L 


d 2 0(t) 
dt 2 
di(t) 
dt 


T>d0 (0 


= T(t) = K T i(t) 


(P6.3.6) 


Convert this system of equations into a first-order vector differential 
equation—that is, a state equation with respect to the state vector 
[0(f) 6 'it) mi 
(g) RC Circuit: A Stiff System 

Consider a two-mesh RC circuit depicted in Fig. P6.3.3. We can write 
the mesh equation with respect to the two mesh currents i\(t) and 
hit) as 



angular 

displacement 

m 


back e.m.f. v b (t) = K b a>(t) = K b ff(t) 
torque T(f) = K T i(t) 


Figure P6.3.2 A DC motor system. 
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R 1 = 100[a] C r = -\0[nF] 



Figure P6.3.3 A two-mesh RC circuit. 


Rihit) + -£- J h(r)dr + R 2 {i\(t) - hit)) = v(t) = t 

Riihit) - hit)) + ^-J hit) dx = 0 (P6.3.7a) 


In order to convert this system of differential equations into a state 
equation, we differentiate both sides and rearrange them to get 


dt dt Ci dt 

„ dixit) „ di 2 (t) 1 , n 

~?0 - - - h Ri —-- 1 - —* 2(0 = 0 

dt dt C 2 


(P6.3.7b) 


R\ + R 2 — R 2 


1 -0(0/Ci' 

— R 2 R 2 

_i' 2 (t)\ 

—0(0/ c 2 _ 


'i'lWl _ r«H 

.32(0 J . - 


*2 - 
*2 *2 


* 2 ] ' [1 - ii(0/Ci' 

e 2 J L -o(o/c 2 _ 


with G, = 1//?, 


-Gt/Ci -Gi/C 2 
-Gi/Ci -(Gi + G 2 )/C 2 


1 


(0 

0(0 _ 


(P6.3.7d) 

where u s (t) denotes the unit step function whose value is 1 (one) V 
t > 0 . 

(i) After constructing an M-file function “df 6p03g. m” which defines 
Eq. (P6.3.7d) with R x = 100[£2], C x = lO[fiF], R 2 = l[Jfc£2], C 2 = 
10[/zE], use the MATLAB built-in routines “ode45()” and 
“ode23s()” to solve the state equation with the zero initial con¬ 
dition 11 (0) = i 2 (0) = 0 and plot the numerical solution i 2 (t) for 
0 < t < 0.05 s. For possible change of parameters, you may declare 
Ri, Ci, R 2 , C 2 as global variables both in the function and in the 
main program named, say, “nm6p03g. m”. Do you see any symptom 
of stiffness from the results? 
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(ii) If we apply the Laplace transform technique to solve this equation 
with zero initial condition i(0) = 0, we can get 

[ h(s)~\ (6.5.5) , r i D ,, 

/ 2 (,)J = [sI ~ A] Buis) 

_\s + G l /C 1 Gi/Ci l-'fGill 

L G\/C 2 s + (Gi + G 2 )/C 2 J [ g^s 

Il(s) = _ _ 

i 2 + (Gi/Cj + (Gi + G 2 )/C 2 )s + G x G 2 IC x C 2 

1/100 

- s 2 + 21005 + 100000 
1/100 

f (s + 2051.25)(s+48.75) 

L_I_ ) 

200250 Vs + 48.75 5 + 2051.25 ) 


where A.i = —2051.25 and X 2 = —48.75 are actually the eigenval¬ 
ues of the system matrix A in Eq. (P6.3.7d). Find the measure of 
stiffness defined by Eq. (6.5.26). 

(iii) Using the MATLAB symbolic computation command “dsolve ()”, 
find the analytical solution of the differential equation (P6.3.7b) and 
plot i 2 (t) together with (P6.3.7e) for 0 < t < 0.05 s. Which of the 
two numerical solutions obtained in (i) is better? You may refer to 
the following code: 



6.4 Physical Meaning of a Solution for Differential Equation and Its Animation 

Suppose we are going to simulate how a vehicle vibrates when it moves 
with a constant speed on a rugged way, as depicted in Fig. P6.4a. Based on 
Newton’s second law, the situation is modeled by the differential equation 
(P6.4.1). 

-y(t) + B^—(y(t) - u(t)) + K(y(t ) - u(t)) = 0 (P6.4.1) 

at L at 

with y(0) = 0, /(0) = 0 
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%dO_MBK 

Clf 

tO = 0; tf = 10; xO = [00]; 

[t1,x] = ode_Ham('f_MBK',[to tf],xO); 
dt = tl(2) - tl(1); 
for n = 1:length(t1) 
u(n) = udu_MBK(t1(n)); 
end 

figure(l), clf 
animation = 1; 
if animation 
figure(2), clf 
draw_MBK(5,1,x(1,2) ,u(1 )) 
axis([-2 2 -1 14]), axis('equal') 
pause 

for n = 1:length(t1) 

clf, draw_MBK(5,1,x(n,2),u(n),'b') 
axis([-2 2 -1 14]), axis('equal') 
pause(dt) 
figure(1) 

plot(t1(n),u(n),'r.', tl(n),x(n,2),'b. 1 ) 
axis([0 tf -0.2 1.2]), hold on 
figure(2) 
end 

draw_MBK(5,1,x(n,2),u(n)) 
axis([-2 2 -1 14]), axis('equal') 
end 


function [u,du] = udu_MBK(t) 
i = fix(t); 

if mod(i,2) == 0, u = t-i; du = 1; 
else u = 1 - t + i; du = -1; 
end 


function draw_MBK(n,w,y,u,color) 

%n: the # of spring windings 

%w: the width of each object 

%y: displacement of the top of MBK 

%u: displacement of the bottom of MBK 

if nargin < 5, color = 'k'; end 

pi = [-w u + 4]; p2 = [-w 9 + y]; 

xm = 0; ym = (p1(2) + p2(2))/2; 

xM = xm + w*1.2*[-1 -1 1 1-1]; 

yM = p2(2) + w*[1 3 3 1 1]; 

plot(xM,yM,color), hold on %Mass 

spring(n,pi,p2,w,color) %Spring 

damper(xm + w,p1(2),p2(2),w,color) %Damper 

wheel_my(xm,p1(2)- 3*w,w,color) %Wheel 


function dx = f_MBK(t,x) 

M = 1; B = 0.1; K = 0.1; 

[u,du] = udu_MBK(t); 

dx = x*[0 1; -B/M - K/M]'+[0 (K*u + 


B*du)/M]; 
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function spring(n,pi,p2,w,color) 

%draw a spring of n windings, width w from pi to p2 
if nargin < 5, color = 1 k 1 ; end 
C = ( P 2(1) - pi(1))/2; d = ( P 2(2) - pi(2))/2; 

f = ( P 2(1) + pi(1))/2; g = (p2(2) + pi(2))/2; 

y = -1:0.01:1; t = (y+1)*pi*(n + 0.5); 
x = -0.5*w*sin(t); y = y+0.15*(1 - cos(t)); 
a = y(1); b=y(length(x)); 
y = 2*(y - a)/(b - a) -1; 

yyS = d*y - c*x + g; xxS = x+f; xxSI = [f f]; 
yySI = yyS(length(yyS))+[0 w]; yyS2 = yyS(1)-[0 w]; 

plot(xxS,yyS,color, xxSI,yyS1.color, xxSI,yyS2,color) _ 

function damper(xm,y1,y2,w,color) 

%draws a damper in (xm-0.5 xm + 0.5 yl y2) 
if nargin < 5, color = 1 k 1 ; end 
ym = (yl + y2)/2; 

xDI = xm + w*[0.3*[0 0 -1 1]]; yDI = [y2 + w ym ym ym]; 
xD2 = xm + w*[0.5*[-1 -1 1 1]]; yD2 = ym + w*[1 -1 -1 

i]; 

xD3 = xm + [0 0]; yD3 = [yl ym] - w; 

plOt(xD1,yDI,color, xD2,yD2,color, xD3,yD3,color) 
function wheel_my(xm,ym,w,color) 

%draws a wheel of size w at center (xm,ym) 

if nargin < 5, color = 1 k 1 ; end 

xWI = xm + w*1.2*[-1 1]; yWI = ym + w*[2 2]; 

xW2 = xm*[1 1]; yW2 = ym + w*[2 0]; 

plot(xWI,yWI,color, xW2,yW2,color) 

th = [0:100]/50*pi; plot(xm + j*ym+w*exp(j*th),color) _ 
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where the values of the mass, the viscous friction coefficient, and the spring 
constant are given as M = 1 kg, B = 0.1 N s/m, and K = 0.1 N/m, respec¬ 
tively. The input to this system is the movement u(t ) of the wheel part 
causing the movement y{t) of the body as the output of the system and is 
approximated to a triangular wave of height 1 m, duration 1 s, and period 
2 s as depicted in Fig. P6.4b. After converting this equation into a state 
equation as 

h ( °i=r ° 1 ir- ( 4r ° i 

[x' 2 (t)\ [-K/M -B/M\[x 2 (t)\^[(B/M)u'(t) + (K/M)u(t)\ 

(P6.4.2) 



we can use such routines as ode_Ham(), ode45(),... to solve this state 
equation and use some graphic functions to draw not only the graphs of 
y(t) and u(t), but also the animated simulation diagram. You can run 
the above MATLAB program “do MBK.m” to see the results. Does the 
suspension system made of a spring and a damper as depicted in Fig. 
P6.4a absorb effectively the shock caused by the rolling wheel so that the 
amplitude of vehicle body oscillation is less than 1/5 times that of wheel 
oscillation? 

(cf) If one is interested in graphic visualization with MATLAB, he/she can refer to 
IN-1J. 

6.5 A Nonlinear Differential Equation for an Orbit of a Satellite 

Consider the problem of an orbit of a satellite, whose position and velocity 
are obtained as the solution of the following state equation: 

x\(t) = XT,(t) 

4(0 = x 4 (t) 

x' 3 (t) = -GM EXl {t)/{x\{t) + xi(t)f» (P6.5.1) 

4(0 = -GM E x 2 (f)/0cl(t) +x 2 2 (0) 3/2 

where G = 6.672 x 10“ 11 N m 2 /kg 2 is the gravitational constant, and 
M e = 5.97 x 10 24 kg is the mass of the earth. Note that (x] , x 2 ) and (A 3 , x 4 ) 
denote the position and velocity, respectively, of the satellite on the plane 
having the earth at its origin. This state equation is defined in the M-file 
‘df_sat.m’ below. 

(a) Supplement the following program “nm6p05.m” which uses the three 
routines ode_RK4(), ode45(), and ode23() to find the paths of the 
satellite with the following initial positions/velocities for one day. 
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function dx = df_sat(t,x) 
global G Me Re 
dx = zeros(size(x)); 
r - sqrt(sum(x(1:2).~2)); 

if r <= Re, return; end % when colliding against the earth surface 
GMr3 = G*Me/r'3; 

dx(1) = x(3); dx(2) = x(4); dx(3) = -GMr3*x(1); dx(4) = -GMr3*x(2); 


%nm6p05.m to solve a nonlinear d.e. on the orbit of a satellite 
clear, elf 
global G Me Re 

G = 6.67e-11; Me = 5.97e24; Re = 64e5; 
f = 'dfsat'; ; 

to = 0; T = 24*60*60; tf = T; N = 2000; 

R = 4.223e7; 

v20s = [3071 3500 2000]; 
for iter = 1:length(v20s) 

xIO = R; x20 = 0; vIO = 0; v20 = v20s(iter); 
xO = [xIO x20 vIO v20]; tol = 1e-6; 

[tR,xR] = ode_RK4(f,[tO tf],xO,N); 

[t45,x45] = ode45(????????????); 

[t23s,x23s] = ode23s(f,[tO tf],xO); 

plot(xR(:,1),xR(:,2),'b 1 , x45(:,1),x45(:,2), 1 k. 1 , ????????????) 
[t45,x45] = ode45(f,[to tf],xO,odeset( 1 RelTol',tol)); 
[t23s,x23s] = ode23s(?????????????????????????????????); 
plot(xR(:,1),xR(:,2),'b 1 , x45(:,1),x45(:,2), 1 k. 1 , ????????????) 
end 


(i) (*10, X20) = (4.223 x 10 7 , 0)[m] and Oc 30 , x 4 o) = (vio, v 2 o) = 

(0, 3071)[m/s]. 

(ii) (xio, x 2 o) = (4.223 x 10 7 , 0)[m] and (x 30 , x 40 ) = (v 10 , v 2 o) = 

(0, 3500) [m/s], 

(iii) Uio, x 20 ) = (4.223 x 10 7 , 0)[m] and (x 30 , x 40 ) = (v 10 , v 20 ) = 

(0, 2000)[m/s]. 

Run the program and check if the plotting results are as depicted in 
Fig. P6.5. 

(b) In Fig. P6.5, we see that the “ode23s()” solution path differs from 
the others for case (ii) and the “ode45()” and “ode23s()” paths differ 
from the “ode_RK4()” path for case (iii). But, we do not know which 
one is more accurate. In order to find which one is the closest to the 
true solution, apply the two routines “ode45()” and “ode23s()” with 
smaller relative error tolerance of tol = 1 e-6 to find the paths for the 
three cases. Which one do you think is the closest to the true solution 
among the paths obtained in (a)? 

(cf) The purpose of this problem is not to compare the several MATLAB routines, 
but to warn the users of the danger of abusing them. With smaller number 
of steps (N) (i.e., larger step size), the routine “ode_RK4()” will also deviate 
much from the true solution. The MATLAB built-in routines have too many 
good features to be mentioned here. Note that setting the parameters such as 
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Figure P6.5 The paths of a satellite with the same initial position and different initial velocities. 


the relative error tolerance (RelTol) is sometimes very important for obtaining 
a reasonably accurate solution. 

6.6 Shooting Method for BVP with Adjustable Position and Fixed Angle 
Suppose the boundary condition for a second-order BVP is given as 

x\t 0 )=x 20 , x(tf) = x lf (P6.6.1) 

Consider how to modify the MATLAB routines “bvp2_shoot()” and 
“bvp2_f df ()” so that they can accommodate this kind of problem. 

(a) As for “bvp2_shootp()” that you should make, the variable quantity 
to adjust for improving the approximate solution is not the derivative 
x'(to), but the position x(to) and what should be made close to zero is 
still f(x(t 0 )) = x(tf ) — Xf. Modify the routine in such a way that x(t 0 ) 
is adjusted to make this quantity close to zero and make its declaration 
part have the initial derivative (dxO) instead of the initial position (xO) 
as the fourth input argument as follows. 

function [t,x] = bvp2_shootp(f,tO,tf,dxO,xf,N,tol,kmax) 
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Noting that the initial derivative of the true solution for Eq. (6.6.3) is 
zero, apply this routine to solve the BVP by inserting the following 
statement into the program “do_shoot .m”. 

[t,x1] = bvp2_shootp('df661',tO,tf,0,xf,N,tol,kmax); 

and plot the result to check if it conforms with that (Fig. 6.8) obtained 
by “bvp2_shoot()”. 

(b) As for “bvp2_fdfp()” implementing the finite difference method, you 
have to approximate the boundary condition as 

x'(to) = *20 -> Al = * 20 , *-l = *1 ~ 2/1*20, (P6.6.2) 

2 h 

substitute this into the finite difference equation corresponding to the 
initial time as 

*l-2*o + *_i *!-*_! , D , , 

-TT-1- <*10- 7T, -b <*00*0 = Mo (P6.6.3) 

h z 2 h 

*i - 2*o + *i - 2/2*20 

-—-1- Ml0*20 + <*00*0 — Mo 

h 2 

(a m h 2 - 2)* 0 + 2*i = h 2 u 0 + h(2 - ha w )x 20 (P6.6.4) 

and augment the matrix-vector equation with this equation. Also, make 
its declaration part have the initial derivative (dxO) instead of the initial 
position (xO) as the sixth input argument as follows: 

function [t,x] = bvp2_fdfp(a1,a0,u,t0,tf,dx0,xf,N) 

Noting that the initial derivative of the true solution for Eq. (6.6.10) 
is —7, apply this routine to solve the BVP by inserting the following 
statement into the program “do_f df. m”. 

|t,x1] = bvp2_fdfp(a1,a0,u,t0,tf,-7,xf,N); 

and plot the result to check if it conforms with that obtained by using 
“bvp2_fdf ()” and depicted in Fig. 6.9. 

6.7 BVP with Mixed-Boundary Conditions I 

Suppose the boundary condition for a second-order BVP is given as 

*(*o) = * 10 , c x x{tf) + c 2 x'(tf) = c 3 (P6.7.1) 

Consider how to modify the MATLAB routines “bvp2_shoot()” and 
“bvp2_fdf ()” so that they can accommodate this kind of problem. 
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(a) As for “bvp 2 _shoot ()” that you should modify, the variable quantity 
to adjust for improving the approximate solution is still the derivative 
x'(t 0 ), but what should be made close to zero is 

fix'(t 0 )) = C\x(t f ) + c 2 x'(t f ) - c 3 (P6.7.2) 

If you don’t know where to begin, modify the routine 
“bvp 2 _shoot()” in such a way that x'(to) is adjusted to make this 
quantity close to zero. Regarding the quantity (P6.7.2) as a function of 
x' (to), you may feel as if you were going to solve a nonlinear equation 
f(x'(to)) = 0. Here are a few hints for this job: 

• Make the declaration part have the boundary coefficient vector cf 
= [cl c2 c3] instead of the final position (xf) as the fifth input 
argument as follows. 

function [t,x] = bvp2m_shoot(f,t0,tf,x0,cf,N,tol,kmax) 

• Pick up the first two guesses of x' (to) arbitrarily. 

• You may need to replace a couple of statements in “bvp 2 _shoot () ” by 

e(1) = cf*[x(end,:)'; -1 ]; 
e(k) = cf*[x(end,:)' ; -1 ]; 

Now that you have the routine “bvp 2 m_shoot ()” of your own mak¬ 
ing, don’t hesitate to try using the weapon to attack the following 
problem: 

„ 1 

x (t) - 41 x(t)x'(t) + 2x 2 (t) = 0 with x(0) = 2x(l) - 3;t'(l) = 0 

" (P6.7.3) 

For this job, you only have to modify one statement of the program 
“do_shoot” (Section 6.6.1) into 

[t,x] = bvp2m_shoot('df661',t0,tf,x0,[2 -3 0],N,tol,kmax); 

If you run it to obtain the same solution as depicted in Fig. 6 . 8 , you 
deserve to be proud of yourself having this book as well as MATLAB; 
otherwise, just keep trying until you succeed. 

(b) As for “bvp 2 _fdf ()” that you should modify, you have only to aug¬ 
ment the matrix-vector equation with one row corresponding to the 
approximate version of the boundary condition c\x(tf) + c 2 x'(tf) = C 3 , 
that is, 

c\x N + c 2 —— XJV ~ 1 = C 3 ; — c 2 x N -i + (c\h + c 2 )x N = c 3 h (P6.7.4) 
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Needless to say, you should increase the dimension of the matrix A 
to N and move the x,v term on the right-hand side of the (N — l)th row 
back to the left-hand side by incorporating the corresponding statement 
into the for loop. What you have to do with “bvp2m_fdf ()” for this 
job is as follows: 

• Make the declaration part have the boundary coefficient vector cf = 
[cl c2 c3] instead of the final position (xf) as the seventh input 
argument. 

function [t,x] = bvp2m_fdf(al,aO,u,tO,tf,xO,cf,N) 

• Replace some statement by A = zeros (N,N). 

• Increase the last index of the for loop to N-1. 

• Replace the statements corresponding to the (N — l)th row 
equation by 

A(N,N-1:N) = [-cf(2) cf(1) *h + cf(2)]; b(N) = cf(3)*h; 

which implements Eq. (P6.7.4). 

• Modify the last statement arranging the solution as 

x = [xO trid(A,b)']'; 

Now that you have the routine “bvp2m_fdf ()” of your own making, 
don’t hesitate to try it on the following problem: 

x"(t) + -x'(t) - ^x(t) = 0 with x(l) = 5, x(2) + x'(2 ) = 3 

(P6.7.5) 

For this job, you only have to modify one statement of the program 
“do_f df. m” (Section 6.6.2) into 

[t,x] = bvp2m_fdf(al,aO,u,to,tf,xO,[ 1 1 3],N); 

You might need to increase the number of segments N to improve the 
accuracy of the numerical solution. If you run it to obtain the same 
solution as depicted in Fig. 6.9, be happy with it. 

6.8 BVP with Mixed-Boundary Conditions II 

Suppose the boundary condition for a second-order BVP is given as 

coix(to) + c 02 x'(t 0 ) = c 03 (P6.8.1a) 

c f ix(tf) + c f2 x'(tf) = c /3 (P6.8.1b) 


Consider how to modify the MATLAB routines “bvp2m_shoot()” and 
“bvp2m_f df ()” so that they can accommodate this kind of problems. 
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(a) As for “bvp 2 mm_shoot()” that you should make, the variable quantity 
to be adjusted for improving the approximate solution is x'(to) or x(to) 
depending on whether or not c 0 i ^ 0 , while the quantity to be made 
close to zero is still 

f(x(t 0 ), x\t 0 )) = c n x(tf) + Cf 2 x\tf) - c /3 (P6.8.2) 

If you don’t have your own idea, modify the routine “bvp 2 m_shoot ()” 
in such a way that x'(to) or x(t^) is adjusted to make this quantity 
close to zero and x(t 0 ) or x'Uq) is set by (P6.8.1a), making its decla¬ 
ration as 

function [t,x] = bvp2mm_shoot(f,t0,tf,c0,cf,N,tol,kmax) 

where the boundary coefficient vectors cO = [cOI c02 c03] and cf = 
[ cf 1 cf 2 cf3] are supposed to be given as the fourth and fifth input 
arguments, respectively. 

Now that you get the routine “bvp 2 mm_shoot ()” of your own mak¬ 
ing, try it on the following problem: 

*"('> - = t2 + l ( p6 - 8 - 3 ) 
with jc( 0 ) + 6 x'( 0 ) = 0 , jc( 1 ) + .v'(l) = 0 

(b) As for “bvp 2 _fdf()” implementing the finite difference method, you 
only have to augment the matrix-vector equation with two rows 
corresponding to the approximate versions of the boundary conditions 
coix(/o) + c 0 2x'(t 0 ) = c 0 3 and c f] x(t f ) + c f2 x'(tf) = c /3 , that is, 

Coi*0 + C()2 ——^—- = C03, (C()]h — C()2)X() + Co2*l = CQ 2 h 

(P6.8.4a) 

CflXN + C/2—— N = C/3; — Cf2Xf/-l + (Cf\h + Cf 2 )XN = Cfih 

(P6.8.4b) 

Now that you have the routine “bvp 2 mm_f df ()” of your own making, 
try it on the problem described by Eq. (P6.8.3). 

(c) Overall, you will need to make the main programs like “nm6p08a.m” 
and “nm6p08b.m” that apply the routines “bvp 2 mm_shoot()” and 
“bvp 2 mm_fdf ()” to get the numerical solutions of Eq. (P6.8.3) and 
plot them. Additionally, use the MATLAB routine “bvp4c () ” to get 
another solution and plot it together for cross-check. 

6.9 Shooting Method and Finite Difference Method for Linear BVPs 

Apply the routines “bvp 2 _shoot()”, “bvp 2 _fdf ()”, and “bvp4c()” to 
solve the following BVPs. 



ORDINARY DIFFERENTIAL EQUATIONS 


to solve BVP2 with mixed boundary conditions 
%x" = (2t/t'2 + 1)*x 1 -2/]t'2+1)*x +t'2+1 
% with x(0)+6x 1 (0) = 0, x'(1) + x(1) = 0 
%shooting method 

f = inline('[x(2); 2*(t*x(2) - x(1))./(t.'2 + 1)+(t.'2 + 1)] 1 ,'t','x'); 
to = 0; tf = 1; N = 100; tol = 1e-8; kmax = 10; 

cO = [1 60]; cf = [1 1 0]; %coefficient vectors of boundary condition 
[tt,x_sh] = bvp2mm_shoot(f,tO,tf,cO,cf,N,tol,kmax); 
plotftt,x_sh(:,1),'b 1 ) 


%nm6p08b.m: finite difference method 

al = inline] 1 -2*t. / (t.'2+1) 1 t 1 ); aO = inline] 1 2./(t.'2+1t'); 
u = inline]'t.'2+1 1 , 1 t 1 ); 
to = 0; tf = 1; N = 500; 

cO = [1 6 0]; cf = [1 1 0]; %coefficient vectors of boundary condition 
[tt,x_fd] = bvp2mm_fdf(al,aO,u,tO,tf,cO,cf,N); 
plot(tt,x_fd, 1 r 1 ) 


fix). = f(y'{x), y(x), u(x)) with y(* 0 ) = Jo, y(x f ) = y f (P6.9.0a) 

Plot the solutions and fill in Table P6.9 with the mismatching errors (of the 
numerical solutions) that are defined as 


function err = err_of_sol_de(df,t,x,varargin) 

% evaluate the error of solutions of differential equation 
[Nt,Nx] = size(x); if Nt < Nx, x = x.'; [Nt,Nx] = size(x); end 
nl = 2:Nt - 1; t=t(:); h2s = t(n1 + l)-t(nl-l); 
dx = (x(nl + 1,:) - x(n1 - 1,:))./(h2s*ones(1,Nx)); 

num = x(n1 + 1,:)-2*x(n1,:) + x(n1 - 1,:); den = (h2s/2).'2*ones(1,Nx); 


for n = nl(1):n1(end) 

dfx = feval(df,t(n),[x(n,m) dx(n - 1,m)],varargin{:}); 
errm(n - 1,m) = d2x(n - 1,m) - dfx(end); 
end 


err=sum(errm.~2)/(Nt - 2); 


%nm6p09_1.m 

%y"-y 1 +y = 3*e“2t-2sin(t) with y(0) = 5 & y(2)=-10 

to = 0; tf = 2; yO = 5; yf = -10; N = 100; tol = 1e-6; kmax = 10; 

df = inline] 1 ]y(2); y(2) - y(1)+3*exp(2*t)-2*sin(t)] 1 , 1 t 1 ,'y 1 ); 

al = -1; aO = 1; u = inline]'3*exp(2*t) - 2*sin(t) 1 , 1 t 1 ); 

solinit = bvpinit(linspace(tO,tf,5),[-10 5]); %[1 9] 

fbc = inline]'[yO(1) - 5; yf(1) + 10] 1 ,'yO', 1 yf'); 

% Shooting method 

tic, [tt,y_sh] = bvp2_shoot(df,to,tf,yO,yf,N,tol,kmax); times(l) = toe; 
% Finite difference method 

tic, [tt,y_fd] = bvp2_fdf(al,aO,u,tO,tf,yO,yf,N); times(2) = toe; 

% MATLAB built-in function bvp4c 
sol = bvp4c(df,fbc,solinit,bvpset]'RelTol',1e-6)); 
tic, y_bvp = deval(sol,tt); times(3) = toe 
% Eror evaluation 

ys=[y_sh(:,1) y_fd y_bvp(1,:) 1 ]; plot(tt,ys) 
err=err_of_sol_de(df,tt,ys) 
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Table P6.9 Comparison of the BVP Solver Routines bvp2_shoot()/bvp2_fdf () 


BVP 

Routine 

Mismatching Error 
(P6.9.0b) 

Times 

(P6.9.1) 

N = 100, tol = le-6, 
kmax = 10 

bvp2_shoot() 

1.5 x 10“ 6 


bvp2_fdf() 



bvp4c() 

2.9 x 10“ 6 


(P6.9.2) 

N = 100, tol = le-6, 
kmax = 10 

bvp2_shoot() 



bvp2_fdf() 

1.6 x 10“ 23 


bvp4c() 



(P6.9.3) 

N = 100, tol = le-6, 
kmax = 10 

bvp2_shoot() 

1.7 x 10“ 17 


bvp2_fdf() 



bvp4c() 

7.8 x 10“ 14 


(P6.9.4) 

N = 100, tol = le-6, 
kmax = 10 

bvp2_shoot() 



bvp2_fdf() 

4.4 x 10- 27 


bvp4c() 



(P6.9.5) 

N = 100, tol = le-6, 
kmax = 10 

bvp2_shoot() 

8.9 x 10“ 9 


bvp2_fdf() 



bvp4c() 

8.9 x KT 7 


(P6.9.6) 

N = 100, tol =le-6, 
kmax =10 

bvp2_shoot() 



bvp2_fdf() 

4.4 x IQ" 25 


bvp4c() 




l N ~ 1 

err = - f(Dy(x,), y(x,), u(x,))} 2 (P6.9.0b) 


with 


D ( 2 ) y(xi) 


y(x i+ 1 ) - 2 y{ Xi ) + y(x,_0 
h 2 


Dy ( Xi ) = 


yfe+i) - yfe_i) 

2 h 


(P6.9.0c) 

= x 0 + ih, h = Xf ~ Xo (P6.9.0d) 

and can be computed by using the following routine “err_of_sol_de()”. 
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Overall, which routine works the best for linear BVPs among the three 
routines? 

(a) y" 00 = >•'(*) - y(x) + 3e 2x - 2sinx with y(0) = 5, y(2) = -10 

(P6.9.1) 

(b) y"{x) = —4y(x) with y(0) = 5, y(l) = -5 (P6.9.2) 

(c) y'\t ) = 10~ 6 y(t) + 10- 7 (r 2 - 50t) with y(0) = 0, y(50) = 0 (P6.9.3) 

(d) y'\t) = -2 y(t) + sint with >>(0) = 0, j(l) = 0 (P6.9.4) 

(e) y"(x) = y'(x) + y(x) + e x (l - 2x) with y(0) = 1, y(l) = 3e (P6.9.5) 

(f) - y [ r) + = 0 with }?(1) = In 1, y(2) = ln2 (P6.9.6) 

dr 1 r dr 

6.10 Shooting Method and Finite Difference Method for Nonlinear BVPs 
(a) Consider a nonlinear boundary value problem of solving 
d 2 T 

—- = 1.9x 10- 9 (T 4 - T 4 ), T a = 400 (P6.10.1) 

dx l 

with the boundary condition T (jcq) = T 0 , T(xy) = Tf 

to find the temperature distribution T (x) [°K] in a rod 4 m long, where 
[jc 0 > Xf] = [0, 4], 

Apply the routines “bvp2_shoot ()”, “bvp2_fdf ()”, and “bvp4c()” 
to solve this differential equation for the two sets of boundary conditions 
{r(0) = 500, 7(4) = 300} and {7(0) = 550, 7(4) = 300} as listed in 
Table P6.10. Fill in the table with the mismatching errors defined by 
Eq. (P6.9.0b) for the three numerical solutions 
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Table P6.10 Comparison of the BVP routines bvp2_shoot()/bvp2_fdf () 


Boundary Condition 

Routine 

Mismatching 
Error (P6.9.0b) 

Time 

(seconds) 

(P6.10.1) with T a = 400 

T( 0) = 500, T( 4) = 300 

bvp2_shoot() 



bvp2_fdf() 

3.6 x 10“ 6 


bvp4c() 



(P6.10.1) with T a = 400 

T{ 0) = 550, T{ 4) = 300 

bvp2_shoot() 

NaN (divergent) 

N/A 

bvp2_fdf() 



bvp4c() 

30 x 10“ 5 


(P6.10.2) with 

y(0) = 0, y(l) = 0 

bvp2_shoot() 



bvp2_fdf() 

3.2 x 10“ 13 


bvp4c() 



(P6.10.3) with 

,v(l) = 4, y(2) = 8 

bvp2_shoot() 

NaN (divergent) 

N/A 

bvp2_fdf() 



bvp4c() 

3.5 x 10“ 6 


(P6.10.4) with 

y(l) = 1/3, y(4) = 20/3 

bvp2_shoot() 



bvp4c() 

3.4 x lO” 10 


bvp2_fdf(c) 



(P6.10.5) with 

y(0) = 71/ 2, y(2) = it/A 

bvp2_shoot() 

3.7 x 10“ 14 


bvp2_fdf() 



bvp4c() 

2.2 x 10“ 9 


(P6.10-6) with 

y(2) = 2,y (8) = 1/4 

bvp2_shoot() 



bvp2_fdf() 

5.0 x 10“ 14 


bvp4c() 




{T ( Xi ), i=0: A^} (Xi = xq + ih = xo + i ——— with N = 500 

Note that the routine “bvp2_fdf()” should be applied in an iterative 
way to solve a nonlinear BVP, because it has been fabricated to accom¬ 
modate only linear BVPs. You may start with the following program 
“nm6p10a.m”. Which routine works the best for the first case and the 
second case, respectively? 
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(b) Apply the routines “bvp2_shoot ()”, “bvp2_fdf ()”, and “bvp4c()” to 
solve the following BVPs. Fill in Table P6.10 with the mismatching 
errors defined by Eq. (P6.9.0b) for the three numerical solutions and 
plot the solution graphs if they are reasonable solutions. 


(i) y" - e y = 0 i 

vith y(0) = 0, y(l) = 0 

(P6.10.2) 

„ 1 2 
(ii) y» --/--( 

y ') 2 = 0 with y(l) = 4, y(2) = 8 

(P6.10.3) 

(iii) y"-^— = 

y' +1 

1 20 

0 with y(l) = -, y(4) = — 

(P6.10.4) 

(iv) y" = t(y') 1 

with y(0) = n/2, y(2) = n/4 

(P6.10.5) 

(v) y" + 4/ = 0 

with y( 2) = 2, y'(8) = 1/4 

(P6.10.6) 


Especially for the BVP (P6.10.6), the routine “bvp2m_shoot()” 
or “bvp2mm_shoot()” developed in Problems 6.7 and 6.8 should be 
used instead of “bvp2_shoot()”, since it has a mixed-boundary con¬ 
dition I. 

(cf) Originally, the shooting method was developed for solving nonlinear BVPs, 
while the finite difference method is designed as a one-shot method for solv¬ 
ing linear BVPs. But the finite difference method can also be applied in an 
iterative way to handle nonlinear BVPs, producing more accurate solutions in 
less computation time. 

6.11 Eigenvalue BVPs 

(a) A Homogeneous Second-Order BVP to an Eigenvalue Problem 
Consider an eigenvalue boundary value problem of solving 

y"(x)+co 2 y = 0 (P6.11.1) 

with coiyOo) + C 02 /O 0 ) = 0 , c f iy(x f ) + c f 2 y\x f ) = 0 

to find y(x) for jc e [x 0 , Xj \ with the (possible) angular frequency co. 

In order to use the finite difference method, we divide the solu¬ 
tion interval [jco , x/] into N subintervals to have the grid points x, = 
x 0 + ih = xo + i(xf — x 0 )/N and then, replace the derivatives in the 
differential equation and the boundary conditions by their finite differ¬ 
ence approximations (5.3.1) and (5.1.8) to write 



y ; _! - (2 - X)y, + y , +1 = 0 with X = h 2 co 2 (P6.11.2) 
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Coijo + C02 
Cf\yN + Cf2 


y i - y-i _ 


= 0 -* y_i = 2h —y 0 + yi (P6.11.3a) 
C02 

ytf+i-yri-1 n C/1 

- = 0 ->■ yiv+i = 3/v-i - 2/i— y n 


2 h 


C/2 

(P6.11.3b) 

Substituting the discretized boundary condition (P6.11.3) into (P6.11.2) 
yields 


y-i - 2yo + >’i = -kyo - 


(P6.ll 


(2 — 2 h — yo - 2yi = Ay 0 
\ C 02 / 

yt-i - 2 yi + y t 


for i = 1 : N - 1 
yjv-i - 2 y N + y N+ i = -Xy N 

- 2yiv_i + 

which can be formulated in a compact form as 


-kyi -y /-1 + 2y; - y,+i = A.y,- 

(P6.11.4b) 


I- ^2 + 2/i-—^ y N = Xy N 


" 2 — 2/icoi/co2 

-2 

0 

0 

0 

yo 

-1 

2 

-1 

0 

0 

yi 

0 

-1 

2 

-1 

0 


0 

0 

-1 

2 

-1 

yN -1 

0 

0 

0 

-2 

2 + 2hcf\/cf2_ 

. yv . 


yN -1 

yN J 

[A - A/]y = 0 


For this equation to have a nontrivial solution y ^ 0 , A must be one of 
the eigenvalues of the matrix A and the corresponding eigenvectors are 
possible solutions. 
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function [x,Y,ws,eigvals] = bvp2_eig(x0,xf,cO,cf,N) 

% use the finite difference method to solve an eigenvalue BVP4: 

% y"+w"2*y = 0 with c01y(x0) + c02y'(x0) = 0, cfly(xf) + cf2y'(xf) = 0 
%input: xO/xf = the initial/final boundaries 

% cO/cf = the initial/final boundary condition coefficients 

% N - 1 = the number of internal grid points. 

%output: x = the vector of grid points 

% Y the matrix composed of the eigenvector solutions 

% ws angular frequencies corresponding to eigenvalues 

% eigvals = the eigenvalues 
if nargin < 5|N < 3, N = 3; end 
h = (xf - xO)/N; h2 = h*h; x = x0+[0:N]*h; 

if abs(c0(2)) < eps, N1 = N1 - 1; A(1,1:2) = [2 -1]; 
else A(1,1:2) = [2*(1 -cO(1)/cO(2)*h) -2]; %(P6.11.4a) 

end 

if abs(cf(2)) < eps, N1 = N1 - 1; A(N1,N1 - 1:N1) = [-1 2]; 
else A(N1,N1 - 1:N1) = [-2 2*(1 + cf(1)/cf(2)*h)]; %(P6.11.4c) 

end 

if N1 > 2 

for m = 2:ceil(N1/2), A(m,m - 1:m + 1) = [-1 2 -1]; end %(P6.11.4b) 
end 

for m=ceil(N1/2) + 1:N1 - 1, A(m,:) = fliplr(A(N1 + 1 - m,:)); end 
[V,LAMBDA] = eig(A); eigvals = diag(LAMBDA) 1 ; 

[eigvals,I] = sort(eigvals); % sorting in the ascending order 

ws = sqrt(eigvals)/h; 

if abs(c0(2)) < eps, Y = zeros(I.NI); else Y = []; end 
Y = [Y; V]; 

if abs(cf(2)) < eps, Y = [Y; zeros(1,N1)]; end 


Note the following things: 

• The angular frequency corresponding to the eigenvalue A. can be 

obtained as _ 

co = y/XM/h (P6.11.6) 

• The eigenvalues and the eigenvectors of a matrix A can be obtained 
by using the MATLAB command ‘[V,D] = eig(A)’. 

• The above routine “bvp2_eig()” implements the above-mentioned 
scheme to solve the second-order eigenvalue problem (P6.11.1). 

• In particular, a second-order eigenvalue BVP 

y"(x) + co 2 y = 0 with y(x 0 ) = 0, y(xf) = 0 (P6.11.7) 

corresponds to (P6.11.1) with co = [coi C 02 ] = [1 0] and c/ = 

[c f i c/ 2 ] = [1 0] and has the following analytical solutions: 

y(x) = asmcox with co = k * ,k= 1,2,... (P6.ll.8) 

Xf-x 0 
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(a) Eigenvector solutions for BVP2 (b) Eigenvector solutions for BVP4 

Figure P6.11 The eigenvector solutions of homogeneous second-order and fourth-order BVPs. 

Now, use the routine “bvp2_eig ()” with the number of grid points 
N = 256 to solve the BVP2 (P6.11.7) with x 0 = 0 and x f = 2, find 
the lowest three angular frequencies (&>,•’ s) and plot the corresponding 
eigenvector solutions as depicted in Fig. P6.11a. 

(b) A Homogeneous Fourth-Order BVP to an Eigenvalue Problem 
Consider an eigenvalue boundary value problem of solving 

® 4 y = 0 (P 6 . 11 . 9 ) 

d 2 y d 2 y 

with y(x 0 ) = 0, ^r(*o) = 0, y(x f ) = 0, ^(xy) = 0 

to find y(x) for x € [xo,Xf \ with the (possible) angular frequency co. 

In order to use the finite difference method, we divide the solu¬ 
tion interval [xo,x/] into N subintervals to have the grid points x,- = 
xo + ih = Xo + i (xj — xo)/N and then, replace the derivatives in the 
differential equation and the boundary conditions by their finite differ¬ 
ence approximations to write 


y .--2 

-4t,- 

l + 6 t i 

~ 4Ti+i 

+ Ti+2 _ 0) A _ 

0 



h A 


CO y, - 

2 - 4>’j-| 

+ 6y, ■ 

- 4Ti+i 

+ Ti+2 : 

= Xy, (X = h 4 co 4 ) 

(P6.ll.10) 

To = 0, 

T-i - 

- 2y 0 + 
h 2 

— = 0 

T-t = -Tl 

(P6.ll.11a) 

Tv = 0, 

Tv- 1 

~ 2t v 

+ Tv+i 

= 0 ->• Tv+i = - 


— 

K 2 


-Tv-i 





(P6.11.11b) 
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Substituting the discretized boundary condition (P6.11.11) into (P6.11.10) 
yields 


(P6.ll.lla) 

y -1 - 4y 0 + 6yi - 4y 2 + J3 = Ayj -> 

5yi - 4y 2 + .V3 = Ayi 

(P6.11.11a) 

yo - 4yi + 6 y 2 - 4>> 3 + y 4 = A ,y 2 -> 

- 4 yi + 6 y 2 - 4 y 3 + y 4 = Ay 2 

y t - 4 y i+1 + 6 y i+2 - 4y ;+3 + y i+4 = Xy i+2 
for i s= 1 : IV — 5 (P6.ll.12) 

(P6.11.1 lb) 

y N - 4 - 4y,v- 3 + 6 y N - 2 - 4y N - X + y N = Xy N - 2 -^ 

y N - 4 - 4y N _ 3 + 6 y N - 2 - 4y N _i = Xy N - 2 


yN -3 - 4 yN -2 + 6y N -i - 4y N + y N+ \ = Ay w _i - 
yN -3 - 4yAf-2 + 5yAT_i = Xy N _ i 

which can be formulated in a compact form as 


' 5 

-4 

1 

0 

0 

0 

0 ' 


yi 


yi 

-4 

6 

-4 

1 

0 

0 

0 


yi 


y2 

1 

-4 

6 

-4 

1 

0 

0 


J3 


ys 

0 






0 



= A 


0 

0 

1 

-4 

6 

-4 

1 


yN- 3 


yN- 3 

0 

0 

0 

1 

-4 

6 

-4 


yN-2 


yN-2 

_ 0 

0 

0 

0 

1 

-4 

5 _ 


^n- i_ 


-yN- 1- 


Ay = Ay, [A - A/]y = 0 


For this equation to have a nontrivial solution y ^ 0, A must be one 
of the eigenvalues of the matrix A and the corresponding eigenvectors 
are possible solutions. Note that the angular frequency corresponding 
to the eigenvalue A can be obtained as 

co = Vx/h (P6.ll.14) 


(i) Compose a routine “bvp4_eig()” which implements the above- 
mentioned scheme to solve the fourth-order eigenvalue problem 
(P6.11.9). 


function [x,Y,ws,eigvals] = bvp4_eig(x0,xf,N) 
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(ii) Use the routine “bvp4_eig () ” with the number of grid points N = 
256 to solve the BVP4 (P6.11.9) with xq = 0 and Xf = 2, find the 
lowest three angular frequencies (cu,’s) and plot the corresponding 
eigenvector solutions as depicted in Fig. P6.11b. 

(c) The Sturm-Liouville Equation 

Consider an eigenvalue boundary value problem of solving 

^(/(*)/) + r(x)y = Xq(x)y with y(x 0 ) = 0, y(x f ) = 0 

(P6.ll.15) 

to find y(x) for x e [xq,X f\ with the (possible) angular frequency co. 

In order to use the finite difference method, we divide the solu¬ 
tion interval [xq, x/J into N subintervals to have the grid points x ( - = 
xo + ih = Xq + i (X f — xo)/N, and then we replace the derivatives in 
the differential equation and the boundary conditions by their finite 
difference approximations (with the step size h/2) to write 

/ te + V2)/< J , + */2)-/ fa -*/2)/( J ,->/2) + = i4(x , Ws) 

i| / (* + !) yj± 1 P- - f (* - r) I +rta >*= 

aiyi -1 + fcijy + ctyi+i = Xy t for i = 1, 2, ..., N — 1 (P6.ll.16) 


fixt-h/2) m+h/2} r ( Xi ) 

a, = , 9 . , , k = ' and b, = —— - a, - Ci 

h 2 q(x t ) h 2 q(Xi) q(x t ) 

(P6.ll.17) 

(i) Compose a routine “sturm()” which implements the above- 
mentioned scheme to solve the Sturm-Liouville BVP (P6.ll.15). 


function [x,Y,ws,eigvals] = sturm(f,r,q,xO,xf,N) 

(ii) Use the routine “sturm () ” with the number of grid points N = 256 
to solve the following BVP2: 

-j- ((1 + x 2 )y') = -2 Xy with y(x 0 ) = 0, y(x f ) = 0 

(P6.ll.18) 

Plot the eigenvector solutions corresponding to the lowest three 
angular frequencies (aq-’s). 
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Optimization involves finding the minimum/maximum of an objective function 
f(x) subject to some constraint x € S. If there is no constraint for x to sat¬ 
isfy—or, equivalently, S is the universe—then it is called an unconstrained 
optimization; otherwise, it is a constrained optimization. In this chapter, we 
will cover several unconstrained optimization techniques such as the golden 
search method, the quadratic approximation method, the Nelder-Mead method, 
the steepest descent method, the Newton method, the simulated-annealing (SA) 
method, and the genetic algorithm (GA). As for constrained optimization, we 
will only introduce the MATLAB built-in routines together with the routines for 
unconstrained optimization. Note that we don’t have to distinguish maximization 
and minimization because maximizing /( x) is equivalent to minimizing — f(x) 
and so, without loss of generality, we deal only with the minimization problems. 


7.1 UNCONSTRAINED OPTIMIZATION [L-2, CHAPTER 7] 

7.1.1 Golden Search Method 

This method is applicable to an unconstrained minimization problem such that 
the solution interval [a, b ] is known and the objective function fix) is unimodal 
within the interval; that is, the sign of its derivative f{x) changes at most once in 
[a,h] so that f(x) decreases/increases monotonically for [a, x°\/[x°, b\, where 
x° is the solution that we are looking for. The so-called golden search procedure is 
summarized below and is cast into the routine “opt_gs ()”. We made a MATLAB 
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program “nm711. m”, which uses this routine to find the minimum point of the 
objective function 


f(x) = (x 2 - 4) 2 /B - 1 


(7.1.1) 


GOLDEN SEARCH PROCEDURE 

Step 1. Pick up the two points c = a + (1 — r)h and d = a + rh inside the 
interval [a, b ], where r = (\/5 — l)/2 and h = b — a. 

Step 2. If the values of f(x) at the two points are almost equal [i.e., f(a) « 
fib)] and the width of the interval is sufficiently small (i.e., h ~ 0), 
then stop the iteration to exit the loop and declare x° = c or x° = d 
depending on whether /(c) < fid) or not. Otherwise, go to Step 3. 

Step 3. If /(c) < fid), let the new upper bound of the interval b <— d; oth¬ 
erwise, let the new lower bound of the interval a <- c. Then, go to 
Step 1. 


function [xo,fo] = opt_gs(f,a,b,r,TolX,TolFun, k) 

fc = feval(f,c); fd = feval(f,d); 
if k <= 0 | (abs(h) < TolX & abs(fc - fd) < TolFun) 
if fc <= fd, xo = c; fo = fc; 
else xo = d; fo = fd; 

if k == 0, fprintf('Just the best in given # of iterations'), end 
else 

if fc < fd, [xo,fo] = opt_gs(f,a,d,r,TolX,TolFun,k - 1); 
else [xo,fo] = opt_gs(f,c,b,r,TolX,TolFun,k - 1); 


%nm711.m to perform the golden search method 
f711 = inline('(x.*x-4).'2/8-1','x'); 

a = 0; b = 3; r =(sqrt(5)-1)/2; TolX = 1e-4; TolFun = 1e-4; Maxlter = 100; 
[xo,fo] = opt_gs(f711,a,b,r,TolX,TolFun,Maxlter) 


Figure 7.1 shows how the routine “opt_gs()” proceeds toward the minimum 
point step by step. 

Note the following points about the golden search procedure. 

• At every iteration, the new interval width is 

b-c = b-ia + il-r)ib-a)) = rh or d-a = a+rh-a = rh 

(7.1.2) 

so that it becomes r times the old interval width ib — a = h). 
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The golden ratio r is fixed so that a point c\ = b\ — rh\ = b — r 2 h in the 
new interval [c, b\ conforms with d = a + rh=b — (1— r)h, that is, 


7.1.2 Quadratic Approximation Method 

The idea of this method is to (a) approximate the objective function f(x) by a 
quadratic function p 2 (x) matching the previous three (estimated solution) points 
and (b) keep updating the three points by replacing one of them with the minimum 
point of p 2 (x). More specifically, for the three points 


{(* 0 , fo), (* 1 , / 1 ), (x 2 , fi)} with x 0 < xi < x 2 


we find the interpolation polynomial p 2 (x) of degree 2 to fit them and replace 
one of them with the zero of the derivative—that is, the root of p' 2 (x) = 0 [see 
Eq. (P3.1.2) in Problem 3.1]: 

x = x = /o(*l - X 2 ) + Mxj - X o 2 ) + f 2 (xl - x\) , 

2{/ 0 (xi — x 2 ) + fi(x 2 — x 0 ) + f 2 (xo — * 1 )} 

In particular, if the previous estimated solution points are equidistant with an 
equal distance h (i.e., x 2 — x\ = xi — xo = h), then this formula becomes 

x _ Mx]_ - xj) + fl(xj - xl) + f 2 (xl - x\) I 

2{/ 0 (a:i — x 2 ) + f\(x 2 — x 0 ) + f 2 (xQ — xi)} x 1= x +h 

x 2 =xi+h 


= x 0 + h 


3/0 - 4 h + fi 

2(-/o + 2/i - / 2 ) 


(7.1.5) 
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We keep updating the three points this way until \x 2 — x 0 \ ^0 and/or \f(x 2 ) — 
/(jco)| fs 0 , when we stop the iteration and declare x 3 as the minimum point. 
The rule for updating the three points is as follows. 

1. In case x 0 < *3 < * 1 , we take {x 0 , jc 3 , jc 3 } or {x 3 , x\,x 2 } as the new set of 
three points depending on whether /(jc 3 ) < f(x\) or not. 

2. In case x\ < X3 < x 2 , we take [x ], * 3 , x 2 ] or {x 0 , X ], jc 3 ) as the new set of 
three points depending on whether f(x 2 ) < f(x\) or not. 

This procedure, called the quadratic approximation method, is cast into the 
MATLAB routine “opt_quad()”, which has the nested (recursive call) structure. 
We made the MATLAB program “nm712.m”, which uses this routine to find the 
minimum point of the objective function (7.1.1) and also uses the MATLAB 
built-in routine “fminbnd()” to find it for cross-check. Figure 7.2 shows how 
the routine “opt_quad()” proceeds toward the minimum point step by step. 

(cf) The MATLAB built-in routine “fminbndO” corresponds to “fmin()” in the MAT¬ 
LAB of version.5.x. 


function [xo,fo] = opt_quad(f,xO,TolX,TolFun,MaxIter) 

%search for the minimum of f(x) by quadratic approximation method 
if length(xO) > 2, x012 = x0(1:3); 
else 

if length(xO) == 2, a = x0(1); b = x0(2); 
else a=x0-l0;b=x0+l0; 

x012 = [a (a + b )12 b]; 
end 

f012 = f (xOI2); 

[xo,fo] = opt_quad0(f,x012,f012,TolX,TolFun,MaxIter); 


function [xo,fo] = opt_quad0(f,x012,f012,TolX,TolFun,k) 
xO = x012(1); xl = x012(2); x2 = x012(3); 
fO = f012(1); fl = f012(2); f2 = f012(3); 

nd = [fO - f2 fl - fO f2 - f1]*[xl*x1 x2*x2 x0*x0; xl x2 xO] 1 ; 
x3 = nd(1)/2/nd(2); f3 = feval(f,x3); %Eq.(7.1.4) 
if k <= 0 | abs(x3 - xl) < TolX | abs(f3 - fl) < TolFun 
xo = x3; fo = f3; 

if k == 0, fprintf('Just the best in given # of iterations'), end 
else 

if x3 < xl 

if f3 < fl, xOI2 = [xO x3 xl]; f012 = [fO f3 fl]; 

else x012 = [x3 xl x2]; f012 = [f3 fl f2]; 
end 
else 

if f3 <= fl, x012 = [xl x3 x2]; f012 = [fl f3 f2]; 

else x012 = [xO xl x3]; f012 = [fO fl f3J; 
end 

[xo,f0] = opt_quad0(f,x012,f012,TolX,TolFun,k - 1); 
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Figure 7.2 Process of searching for the minimum by the quadratic approximation method. 


%nm712.m to perform the quadratic approximation method 
clear, elf 

f711 = inline('(x.*x - 4). A 2/8-1', 'x'); 

a = 0; b = 3; TolX = 1e-5; TolFun = 1e-8; Maxlter = 100; 

[xoq,foq] = opt_quad(f711,[a b],TolX,TolFun,Maxlter) 

%minimum point and its function value 

[xob,fob] = fminbnd(f711,a,b) %MATLAB built-in function 


7.1.3 Nelder-Mead Method [W-8] 

The Nelder-Mead method is applicable to the minimization of a multivariable 
objective function, for which neither the golden search method nor the quadratic 
approximation method can be applied. The algorithm of the Nelder-Mead method 
summarized in the box below is cast into the MATLAB routine “NelderOO”. 
Note that in the A-dimensional case (N > 2), this algorithm should be repeated 
for each subplane as implemented in the outer routine “opt_Nelder ()”. 

We made the MATLAB program “nm713. m” to minimize a two-variable objec¬ 
tive function 

f(x i , x 2 ) =x^-x 1 x 2 -4x 1 + xj-x 2 (7.1.6) 

whose minimum can be found in an analytical way—that is, by setting the partial 
derivatives of f(xi,x 2 ) with respect to x\ and x 2 to zero as 

^r/(*t> x 2 ) = 2x x - x 2 - 4 = 0 j 


—f(xi,x 2 ) = 2x 2 - x, -1=0 


x 0 = (x 10 , x 2o ) = (3,2) 
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NELDER-MEAD ALGORITHM 

Step 1. Let the initial three estimated solution points be, a, b and c, where 
f(a ) < f(b) < f{c). 

Step 2. If the three points or their function values are sufficiently close to each 
other, then declare a to be the minimum and terminate the procedure. 

Step 3. Otherwise, expecting that the minimum we are looking for may be at 
the opposite side of the worst point c over the line ab (see Fig. 7.3), 
take 

e = m + 2{m — c), where m = {a + b)/2 

and if /(e) < f(b), take e as the new c; otherwise, take 
r = (m + e)/2 = 2m — c 

and if f(r ) < /(c), take r as the new c; if f(r) > f{b), take 
s = (c + m)/2 

and if f(s ) < /(c), take s as the new c; otherwise, give up the two points 
b,c and take m and c\ = (a + c)/2 as the new b and c, reflecting our 
expectation that the minimum would be around a. 

Step 4. Go back to Step 1. 


m = (a + b)l 2 
r=m+(m-c) 
e=m+ 2 (m- c) 



Figure 7.3 Notation used in the Nelder-Mead method. 
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function [xo,fo] = NelderO(f,abc,fabc,TolX,TolFun,k) 

[fabc,I] = sort(fabc); a = abc(1(1),:); b = abc(1(2),:); c = abc(1(3),:); 
fa = fabc(1); fb = fabc(2); fc = fabc(3); fba = fb - fa; fob = fc - fb; 
if k <= 0 | abs(fba) + abs(fcb) < TolFun | abs(b - a) + abs(c - b) < TolX 
xo = a; fo = fa; 
if k == 0, fprintf('Just best 
else 

m = (a + b)/2; e = 3*m - 2*c 
if fe < fb, c = e; fc = fe; 
else 

r = (m+e)/2; fr = feval(f,r); 
if fr < fc, c = r; fc = fr; end 
if fr >= fb 

s = (c + m) 12] fs = feval(f,s); 
if fs < fc, c = s; fc = fs; 
else b = m; c = (a + c)/2; fb = feval(f,b) 


given # of iterations'), 
: e = feval(f,e); 




[xo,fo] = Nelder0(f,[a; 


I,[fa fb fc],TolX,TolFun,k 


function [xo,fo] = opt_Nelder(f,xO,TolX,TolFun,Maxlter) 

N = length(xO); 

if N == 1 %for 1-dimensional case 

[xo,fo] = opt_quad(f,x0,TolX,TolFun); return 
end 

S = eye(N); 

for i = 1:N %repeat the procedure for each subplane 

abc = [xO; xO + S(i,:); xO + S(11,:)]; %each directional subplane 
fabc = [feval(f,abc(1,:)); feval(f,abc(2,:)); fevalff,abc(3,:))]; 
[xO,fo] = NelderOff,abc,fabc,TolX,TolFun,Maxlter); 
if N < 3, break; end %No repetition needed for a 2-dimensional case 
end 

xo = xO; 


%nm713.m: do_Nelder 

f713 = inline("x(1)*(x(1)-4-x(2)) +x(2)*(x(2)-1)','x'); 
xO = [00], TolX = 1e-4; TolFun = 1e-9; Maxlter = 100; 

[xon.fon] = opt_Nelder(f713,xO,TolX,TolFun,Maxlter) 

%minimum point and its function value 

[xos,fos] = fminsearch(f713,x0) %use the MATLAB built-in function 


This program also applies the MATLAB built-in routine “fminsearch ()” to min¬ 
imize the same objective function for practice and confirmation. The minimization 
process is illustrated in Fig. 7.4. 

(cf) The MATLAB built-in routine “fminsearch ()” uses the Nelder-Mead algorithm 
to minimize a multivariable objective function. It corresponds to “fminsO” in the 
MATLAB of version.5.x. 
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7.1.4 Steepest Descent Method 

This method searches for the minimum of an /V-dimcnsional objective function 
in the direction of a negative gradient 


g(x) = — V / (x) = - 


dm df (x) 

dx\ dx2 


df (x) 1 T 
dx N _ 


(7.1.7) 


with the step-size ak (at iteration k) adjusted so that the function value is 
minimized along the direction by a (one-dimensional) line search technique 
like the quadratic approximation method. The algorithm of the steepest descent 
method is summarized in the following box and cast into the MATLAB routine 
“opt_steep()”. 

We made the MATLAB program “nm714.m” to minimize the objective func¬ 
tion (7.1.6) by using the steepest descent method. The minimization process is 
illustrated in Fig. 7.5. 


STEEPEST DESCENT ALGORITHM 

Step 0. With the iteration number k = 0, find the function value /o = /(x 0 ) 
for the initial point x 0 . 

Step 1. Increment the iteration number k by one, find the step-size a^-i along 
the direction of the negative gradient —%k-i by a (one-dimensional) line 
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search like the quadratic approximation method. 

a*-1 = ArgMin^/Cxn - ag*-i/||g*_i||) (7.1.8) 

Step 2. Move the approximate minimum by the step-size oik-\ along the direc¬ 
tion of the negative gradient — gH to get the next point 

x* = x*_i - at-igt-i/llgt-ill (7.1.9) 

Step 3. If Xk ~ Xk ~i and /(x*.) ss /(x*_i), then declare x* to be the minimum 
and terminate the procedure. Otherwise, go back to step 1. 


function [xo,fo] = opt_steep(f,xO,TolX,TolFun,alphaO,MaxIter) 

% minimize the ftn f by the steepest descent method. 

%input: f = ftn to be given as a string "f" 

% xO = the initial guess of the solution 

%output: xO = the minimum point reached 
% f 0 = f (x (0)) 

if nargin < 6, Maxlter = 100; end %maximum # of iteration 
if nargin < 5, alphaO = 10; end %initial step size 
if nargin < 4, TolFun = 1e-8; end %|f(x)| < TolFun wanted 
if nargin < 3, TolX = 1 e-6; end %|x(k) - x(k - 1)|<TolX wanted 
x = xO; fxO = feval(f,x0); fx = fxO; 
alpha = alphaO; kmaxl = 25; 

warning = 0; %the # of vain wanderings to find the optimum step size 
for k = 1: Maxlter 

g = grad(f,x); g = g/norm(g); %gradient as a row vector 
alpha = alpha*2; %for trial move in negative gradient direction 
fxl = feval(f,x - alpha*2*g); 

for kl = 1:kmax1 %find the optimum step size(alpha) by line search 
fx2 = fxl; fxl = fevalff,x-alpha*g); 

if fxO > fxl+TolFun & fxl < fx2 - TolFun %fx0 > fxl < fx2 




330 OPTIMIZATION 


7.1.5 Newton Method 


Like the steepest descent method, this method also uses the gradient to search for 
the minimum point of an objective function. Such gradient-based optimization 
methods are supposed to reach a point at which the gradient is (close to) zero. 
In this context, the optimization of an objective function /(x) is equivalent to 
finding a zero of its gradient g(x), which in general is a vector-valued function 
of a vector-valued independent variable x. Therefore, if we have the gradient 
function g(x) of the objective function /(x), we can solve the system of nonlinear 
equations g(x) = 0 to get the minimum of /(x) by using the Newton method 
explained in Section 4.4. 

The backgrounds of this method as well as the steepest descent method can 
be shown by taking the Taylor series of, say, a two-variable objective function 
f(x u x 2 y. 


+ ![*,-« «-«][ 8 2 8 ^£ 


a 2 //9X19X2II 


1 — X\k 1 


L d 2 f/dx 2 dx 1 9 2 //9x| J | (x ^ xik) L x 2 - x 2k J 

f(x) = f ix k ) + V/(x) r | x Jx - x A .] + i[x - Xj.] T V 2 /(x)| Xt [x - x A ,] 
fix) = fix k ) + g[ [x - x*] + hx - x k ] T H k [x - x k ] (7.1.10) 


with the gradient vector g /( = V/'(x)| Xt and the Hessian matrix H k = V 2 /(x)| Xjt . 
In the light of this equation, we can see that the value of the objective function at 
point x k+ i updated by the steepest descent algorithm described by 
Eq. (7.1.9) 

(7.1.9) ... „ 

X*+l = X* — Q'*g*/||g*|| 

is most likely smaller than that at the old point x k , with the third term in 
Eq. (7.1.10) neglected. 


/(xit+ i) = fi*k) + g l [x*+i - *k\ = fi*k) ~ a*g* g*/llg*ll 
fi*k+ 1) - /(x t ) = -a k g[g k /\\g k \\ < 0 ^ fix k+ 1) < fix, t) 


Slightly different from this strategy of the steepest descent algorithm, the Newton 
method tries to go straight to the zero of the gradient of the approximate objective 
function (7.1.10) 


g* + H k [x — x*] = 0, x = x k -Hf l g k (7.1.12) 


by the updating rule 

x* +1 =x k -Hf 1 g k (7.1.13) 

with the gradient vector g k = Vf(x)\ xk and the Hessian matrix H k = V 2 /(x )\ xk 
(Appendix C). 
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This algorithm is essentially to find the zero of the gradient function g(x) of the 
objective function and consequently, it can be implemented by using any vector 
nonlinear equation solver. What we have to do is just to define the gradient 
function g(x) and put the function name as an input argument of any routine 
like “newtons()” or “fsolvef)” for solving a system of nonlinear equations 
(see Section 4.6). 

Now, we make a MATLAB program “nm715.m”, which actually solves 
g(x) = 0 for the gradient function 

g(x) = V/(x)=[^ IL] = [2xi-* 2 -4 2x 2 -xi-1] (7.1.14) 

of the objective function (7.1.6) 

/(x) = fix 1, x 2 ) = xj - Xjx 2 - 4xi + xf - x 2 

Figure 7.5 illustrates the process of searching for the minimum point by the New¬ 
ton algorithm (7.1.13) as well as the steepest descent algorithm (7.1.9), where the 
steepest descent algorithm proceeds in the negative gradient direction until the 
minimum point in the line is reached, while the Newton algorithm approaches 
the minimum point almost straightly and reaches it in a few iterations. 

»nm7l5 

xo = [3.0000 2.0000], ans = -7 


%nm715 to minimize an objective ftn f(x) by the Newton method, 
clear, elf 

f713 = inline('x(1j ."2 - 4*x(1) - x(1).*x(2) + x(2).~2 - x(2) 1 , 1 x 1 ); 
g713 = inline) 1 [2*x(1) - x(2) - 4 2*x(2) - x(1) - 1]','x'); 
xO = [00], TolX = 1e-4; TolFun = 1e-6; Maxlter = 50; 

[xo,go,xx] = newtons(g713,xO,TolX,Maxlter); 

xo, f713(xo) %an extremum point reached and its function value 



Figure 7.5 Process for the steepest descent method and Newton method (“nm714. m” and 
“nm715.m”). 
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Remark 7.1. Weak Point of Newton Method. 

The Newton method is usually more efficient than the steepest descent method 
if only it works as illustrated above, but it is not guaranteed to reach the minimum 
point. The decisive weak point of the Newton method is that it may approach one 
of the extrema having zero gradient, which is not necessarily a (local) minimum, 
but possibly a maximum or a saddle point (see Fig. 7.13). 

7.1.6 Conjugate Gradient Method 

Like the steepest descent method or Newton method, this method also uses the 
gradient to search for the minimum point of an objective function, but in a 
different way. It has two versions—the Polak-Ribiere (PR) method and the 
Fletcher-Reeves (FR) method—that are slightly different only in the search 
direction vector. This algorithm, summarized in the following box, is cast into 
the MATLAB routine “opt_conjg()”, which implements PR or FR depending 
on the last input argument KC = 1 or 2. The quasi-Newton algorithm used in 
the MATLAB built-in routine “fminunc()” is similar to the conjugate gradi¬ 
ent method. 

This method borrows the framework of the steepest descent method and needs 
a bit more effort for computing the search direction vector s(n). It takes at most N 
iterations to reach the minimum point in case the objective function is quadratic 
with a positive-definite Hessian matrix H as 

1 

/(x) = -x Hx + b x + c where x: an N -dimensional vector (7.1.15) 


CONJUGATE GRADIENT ALGORITHM 

Step 0. With the iteration number k = 0, find the objective function value 
/ 0 = /(x 0 ) for the initial point x 0 . 

Step 1. Initialize the inside loop index, the temporary solution and the search 
direction vector to n = 0, x(n) = x k and s(n) = —g k = —g(x k ), respec¬ 
tively, where g(x) is the gradient of the objective function /(x). 

Step 2. For n = 0 to N — 1, repeat the following things: 

Find the (optimal) step-size 

a n = ArgMin a /(x(n) + as («)) (7.1.16) 

and update the temporary solution point to 

x(n + 1) = x(n) + a„s(n) (7.1.17) 

and the search direction vector to 

s(n + 1) = -g„+i + P„s(n) (7.1.18) 



UNUUNS I KAINtU UK IIMIZAIIUN 0*3 


n [§fi+l §«] 8n+l /CD , 

Pn = - r -(r* K ) or 

g„gn 


-(PR) (7.1.19) 


Step 3. Update the approximate solution point to x* + i = x(N), which is the 
last temporary one. 

Step 4. If x k x k _\ and f(x k ) « f(x k _i), then declare x k to be the minimum 
and terminate the procedure. Otherwise, increment k by one and go back 
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Based on the fact that minimizing this quadratic objective function is equivalent 
to solving the linear equation 

g(x) = V/(x) = Hx + b = 0 (7.1.20) 

MATLAB has several built-in routines such as “cgs()”,“pcg()’\ and “bicgO”, 
which use the conjugate gradient method to solve a set of linear equations. 

We make the MATLAB program “nm716.m” to minimize the objective func¬ 
tion (7.1.6) by the conjugate gradient method and the minimization process is 
illustrated in Fig. 7.6. 

7.1.7 Simulated Annealing Method [W-7] 

All of the optimization methods discussed so far may be more or less efficient 
in finding the minimum point if only they start from the initial point sufficiently 
close to it. But, the point they reach may be one of several local minima and we 
often cannot be sure that it is the global minimum. How about repeating the pro¬ 
cedure to search for all local minima starting from many different initial guesses 
and taking the best one as the global minimum? This would be a computation¬ 
ally formidable task, since there is no systematic way to determine a suitable 
sequence of initial guesses, each of which leads to its own (local) minimum so 
that all the local minima can be exhaustively found to compete with each other 
for the global minimum. 

An interesting alternative is based on the analogy between annealing and min¬ 
imization. Annealing is the physical process of heating up a solid metal above its 
melting point and then cooling it down so slowly that the highly excited atoms 
can settle into a (global) minimum energy state, yielding a single crystal with 
a regular structure. Fast cooling by rapid quenching may result in widespread 
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irregularities and defects in the crystal structure, analogous to being too hasty 
to find the global minimum. The simulated annealing process can be imple¬ 
mented using the Boltzmann probability distribution of an energy level E(>0) 
at temperature T described by 

p(E) = a cxp(—E/KT) with the Boltzmann constant K and a = l/KT 

(7.1.21) 

Note that at high temperature the probability distribution curve is almost flat over 
a wide range of E, implying that the system can be in a high energy state as 
equally well as in a low energy state, while at low temperature the probability 
distribution curve gets higher/lower for lower/higher E, implying that the system 
will most probably be in a low energy state, but still have a slim chance to be 
in a high energy state so that it can escape from a local minimum energy state. 

The idea of simulated annealing is summarized in the box below and cast 
into the MATLAB routine “sim_anl()”. This routine has two parts that vary 
with the iteration number as the temperature falls down. One is the size of step 
Ax from the previous guess to the next guess, which is made by generating a 
random vector y having uniform distribution U[— 1, +1] and the same dimension 
as the variable x and multiplying /x -1 (y) (in a termwise manner) by the difference 
vector (u — 1) between the upper bound u and the lower bound 1 of the domain 
of x. The yu _1 -law 


n _l_ w'jhl _ \ 

g~\y)= -sign(y) for \y\ < 1 (7.1.22) 

implemented in the routine “mu_inv()” has the parameter p that is increased 
according to a rule 

/x = with q > 0: the quenching factor (7.1.23) 

as the iteration number k increases, reaching p = 10 100 at the last iteration k = 
krnux- Note the following: 

• The quenching factor q > 0 is made small/large for slow/fast quenching. 

• The value of /u _1 -law function becomes small for |y| < 1 as p increases 
(see Fig. 7.7a). 

The other is the probability of taking a step Ax that would result in change 
A/ > 0 of the objective function value /(x). Similarly to Eq. (7.1.21), this is 
determined by 


p (taking the step Ax) = exp 


:-(&* 


A / N 

l/(x)|£/. 


for A/ > 0 (7.1.24) 




336 OPTIMIZATION 



0.8 


0.6 


0 



0 Af/\f(x)\/e f 


(a) The mu 

Figure 7.7 


j-law inverse function g~^ (y) (b) The exponential function for 

randomness control 

Illustrative functions used for controlling the randomness-temperature in SA. 


SIMULATED ANNEALING 

Step 0. Pick the initial guess x 0 , the lower bound 1, the upper bound u, the 
maximum number of iterations k max > 0, the quenching factor q > 0 (to 
be made small/large for slow/fast quenching), and the relative tolerance 
Sf of function value fluctuation. 

Step 1. Let x = x 0 , x° = x, f° = /(x). 

Step 2. For k = 1 to k max , do 

{Generate an N x 1 uniform random vector of U[— 1, +1] and transform 
it by the inverse p, law (with /i = lo 100(i/i ”“ )4 ) to make Ax and then 
take Xi <— x Ax, confining the next guess inside the admissible region 
{x|l < x < u} as needed. 

If Af = /( Xl ) - /(x) < 0, 

{set x <r- xi and if /(x) < f°, set x° x and f° /(x°).} 

Otherwise, 

{generate a uniform random number z of U[0,1] and set x •<— xi only in case 
z < p(taking the step Ax) (7 = 24) exp(-(k/k max ) q Af/\f(x)\/Sf) 

} 

} 

Step 3. Regarding x° as close to the minimum point that we are looking for, 
we may set x° as the initial value and apply any (local) optimization 
algorithm to search for the minimum point of /(x). 
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function [xo,fo] = sim_anl(f,xO,l,u,kmax,q,TolFun) 

% simulated annealing method to minimize f(x) s.t. 1 <= x <= u 
N = length(xO); 
x = xO; fx = feval(f,x); 
xo = x; fo = fx; 

if nargin < 7, TolFun = 1e-8; end 

if nargin <6, q = 1; end %quenching factor 

if nargin < 5, kmax = 100; end %maximum iteration number 

for k = 0:kmax 

Ti = (k/kmax)"q; %inverse of temperature from 0 to 1 
mu = 1CT(Ti*100); % Eq.(7.1.23) 
dx = mu_inv(2*rand(size(x))- 1,mu).*(u - 1); 
xl = x + dx; %next guess 

xl = (xl < 1).*1 +(1 <= x1).*(x1 <= u).*x1 +(u < x1).*u; 
%confine it inside the admissible region bounded by 1 and u. 
fxl = feval(f,x1); df = fxl - fx; 

if df < 0|rand < exp(-Ti*df/(abs(fx) + eps)/TolFun) Eq.(7.1.24) 
x = xl; fx = fxl; 

if fx < fo, xo = x; fo = fxl; end 


function x = mu_inv(y,mu) % inverse of mu-law Eq.(7.1.22) 
x = (((1+mu).'abs(y)- 1)/mu).*sign(y); 


%nm717 to minimize an objective function f(x) by various methods, 
clear, elf 

f - inline)'x(1 )"4 - 16*x(1)'2 - 5*x(1) + x(2)M - 16*x(2)'2 - 5*x(2) ' , 'x'); 
1 = [-5 -5]; u = [5 5]; %lower/upperbound 
xO = [0 0] 

[xo_nd,fo] = opt_Nelder(f,x0) 

[xos,fos] = fminsearch(f,x0) %cross-check by MATLAB built-in routines 

[xou,fou] = fminunc(f,x0) 

kmax = 500; q = 1; TolFun = 1e-9; 

[xo_sa,fo_sa] = sim_anl(f,xO,l,u,kmax,q,TolFun) 


which remains as big as e~ x for |A///(x)| = Sf at the last iteration k = k max , 
meaning that the probability of taking a step hopefully to escape from a local 
minimum and find the global minimum at the risk of increasing the value of 
objective function by the amount A/ = \ f(x)\Sf is still that high. The shapes of 
the two functions related to the temperature are depicted in Fig. 7.7. 

We make the MATLAB program “nm717.m”, which uses the routine “sim_ 
anl()” to minimize a function 

f(x ) = x\ — 16xf — 5*1 +*2 — 16*2 — 5*2 (7.1.25) 

and tries other routines such as “opt_Nelder()”, “fminsearch()”, and “fmi- 
nunc()” for cross-checking. The results of running the program are summa¬ 
rized in Table 7.1, which shows that the routine “sim_anl()” may give us the 
global minimum even when some other routines fail to find it. But, even this 
routine based on the idea of simulated annealing cannot always succeed and its 
success/failure depends partially on the initial guess and partially on luck, while 
the success/failure of the other routines depends solely on the initial guess. 
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Table 7.1 Results of Running Several Optimization Routines with Various Initial Values 


Xo opt_Nelder() fminsearch() fminunc() sim_anl() 


[0, 0] [2.9035, 2.9035] [2.9035, 2.9036] [2.9036, 2.9036] [2.8966, 2.9036] 

(/» = -156.66) (/" =-156.66) (f° = -156.66) (/" = -156.66) 

[-0.5,-1.0] [2.9035, -2.7468] [-2.7468, -2.7468] [-2.7468, -2.7468] [2.9029, 2.9028] 

(/“ =-128.39) (/» = -100.12) (f° = - 100.12) (f° = -156.66) 


7.1.8 Genetic Algorithm [W-7] 

Genetic algorithm (GA) is a directed random search technique that is mod¬ 
eled on the natural evolution/selection process toward the survival of the fittest. 
The genetic operators deal with the individuals in a population over several 
generations to improve their fitness gradually. Individuals standing for possi¬ 
ble solutions are often compared to chromosomes and represented by strings of 
binary numbers. Like the simulated annealing method, GA is also expected to 
find the global minimum solution even in the case where the objective func¬ 
tion has several extrema, including local maxima, saddle points as well as local 
minima. 

A so-called hybrid genetic algorithm [P-2] consists of initialization, evalu¬ 
ation, reproduction (selection), crossover, and mutation as depicted in Fig. 7.8 


Initialize 
the population 



Figure 7.8 Flowchart for a genetic algorithm. 
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N p = 8, N=2, N b =[8 8] 
pool P 

01100110 01100110 

01001111 10101011 

11110110 01101000 

01110111 11101111 

10101101 10110011 

11011011 11110110 

11011000 00000001 

10011100 00011110 


decode 


random pairing 


al 01111100 i 

bi oioliom -i i 

cl 11000010 ~i 
dl 10010011 
el 10101101 

fl 11000010 -I 

gl 10101101 - 
hi 10100001 


01111110a2 
10101011 b2 
100111100 c2 
11001111 d2 encode 
10110011 e2 * 
11010010 f2 
10110011 g2 
01001010 h2 


crossover/mutuation ^ 


al 01111100 
bi'01000010 
cl' 11001101 
dl 10010011 
el 10101101 
fl' 11010111 
gl' 10100010 
hi 10100001 


01111110a2 
10101010 b2' 
100111)1 c2' 
11001111 d2 decode 

10110011 e2 -► 

11010011 f2' 
10110000 g2' 
01001010 h2 


population X fx 

-1.0000 -1.0000 -10.00 

-1.9020 -1.7059 -40.95 

4.6471 -0.9216 44.67 

-0.3333 4.3725 , 18.84 

evaluate ,- 

1.7843 2.0196 _► (-54.22 

3.5882 4.6471 19.71 

3.4706 -4.9608 85.84 

1.1176 -3.8235 -12.54 

I reproduction 


-0.1209 -0.0466 0.28 

-1.5527 1.7356 -36.40 

2.6259 1.1550 -50.62 

0.7713 3.1452 -44.58 

1.7843 2.0196 -54.22 

2.6360 3.2601 (-74.73' 

1.7843 2.0196 -54.22 

1.3160 -2.0846 -35.76 


-0.1373 -0.0588 0.31 

-2.4118 1.6667 -46.12 

3.0392 1.2353 - 52.96 

0.7647 3.1176 -44.73 

1.7843 2.0196 -54.22 

3.4314 3.2745 (- 69.94' 

1.3529 1.9020 - 43.50 

1.3137 -2.0980 -35.88 


Figure 7.9 Reproduction/crossover mutation in one iteration of genetic algorithm. 


and is summarized in the box below. The reproduction/crossover process is illus¬ 
trated in Fig 7.9. This algorithm is cast into the routine “genetic()” and we 
append the following statements to the MATLAB program “nm717.m” in order 
to apply the routine for minimizing the function defined by Eq. (7.1.25). Inter¬ 
ested readers are welcome to run the program with these statements appended 
and compare the result with those of using other routines. Note that like the 
simulated annealing, the routine based on the idea of GA cannot always suc¬ 
ceed and its success/failure depends partially on the initial guess and partially 
on luck. 
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Np = 30; %population size 

Nb = [12 12]; %the numbers of bits for representing each variable 
Pc = 0.5; Pm = 0.01; %Probability of crossover/mutation 
eta = 1; kmax = 100; %learning rate and the maximum # of iterations 
[xo_gen,fo_gen] = geneticff,xO,l,u,Np,Nb,Pc,Pm,eta,kmax) 


HYBRID GENETIC ALGORITHM 

Step 0. Pick the initial guess x 0 = Uoi ... xqn](N: the dimension of the vari¬ 
able), the lower bound 1 = [/i... In], the upper bound u = \u\ ... un\, 
the population size N p , the vector N/, = [A),i ... consisting of the 
numbers of bits assigned for the representation of each variable jc,-, the 
probability of crossover P c , the probability of mutation P m , the learn¬ 
ing rate rj(0 < i] < 1, to be made small/large for slow/fast learning), 
and the maximum number of iterations k mia > 0. Note that the dimen¬ 
sions of x 0 , u, and 1 are all the same as N, which is the dimension 
of the variable x to be found and the population size N p can not be 
greater than 2 Nb in order to avoid duplicated chromosomes and should 
be an even integer for constituting the mating pool in the crossover 
stage. 

Step 1. Random Generation of Initial Population 

Set x° = x 0 , f° = /(x°) and construct in a random way the initial pop¬ 
ulation array Xi that consists of N p states (in the admissible region 
bounded by u and 1) including the initial state xo, by setting 

Xi(l) = x 0 and Xi(fc) = 1 + rand.*(u - 1) for k = 2 : Np (7.1.26) 

where rand is a random vector of the same dimension N as x 0 , u, 
and 1. Then, encode each number of this population array into a binary 
string by 

h(n, 1 + Nbi : E'li Nbi) 

= binary representation of X\ (n,m) with N hm bits 
u(m) — l(m) 

for n = 1 : N p and m = 1 : N (7.1.27) 


so that the whole population array becomes a pool array, each row of 
which is a chromosome represented by a binary string of YliLi Nbi bits. 
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Step 2. For k = 1 to k m . dx , do the following: 

1. Decode each number in the pool into a (decimal) number by 
X k {n,m) = decimal representation of 

P k (n, 1 + y;, | Nbi : Y. t N hl \ with N hm bits 


= Pk(n, •) 

for n 


u(m) - Urn) 

2 N bm — 1 +/(OT) 

= 1 : N p and m = 1 : N 


(7.1.28) 


and evaluate the value f(n) of function for every row X k (n ,:) = x(n) 
corresponding to each chromosome and find the minimum f m \„ = /(«*) 
corresponding to X k (nb ,:) = x(n*). 

2. If /min = f(nb) < f°, then set f° = /(«&) and x° = x(n fc ). 

3. Convert the function values into the values of fitness by 


/,(«) = Maxf/j {/(«)} - fin) (7.1.29) 


which is nonnegative V n = 1 : N p and is large for a good chromosome. 

4. If Max^! {/] (n)) & 0, then terminate this procedure, declaring x° as 
the best. 

Otherwise, in order to make more chromosomes around the best point 
x(«fe) in the next generation, use the reproduction rule 

x(ra) x(n) + r] { 1 ^ (x(n b ) - x(«)) (7.1.30) 

f\ (n b ) 

to get a new population X k+ \ with X k+ \(n ,:) = x(«) and encode it to 
reconstruct a new pool array P k+ \ by Eq. (7.1.27). 

5. Shuffle the row indices of the pool array for random mating of the chro¬ 
mosomes. 

6. With the crossover probability P c , exchange the tail part starting from 
some random bit of the numbers in two randomly paired chromosomes 
(rows of P k+ 1 ) with each other’s to get a new pool array P' k+X . 

7. With the mutation probability P m , reverse a random bit of each number 
represented by chromosomes (rows of P k+l ) to make a new pool array 


k+ 1- 
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motion [xo,fo] = genetic(f,x0,1,u,Np,Nb,Pc,Pm,eta,kmax) 

Genetic Algorithm to minimize f(x) s.t. 1 <= x <= u 
= length(xO); 

nargin < 10, kmax = 100; end %# of iterations(generations) 

nargin < 9|eta > 1|eta <= 0, eta = 1; end %learning rate(0 < eta < 1) 

nargin <8, Pm = 0.01; end %probability of mutation 
nargin < 7, Pc = 0.5; end %probability of crossover 
nargin < 6, Nb = 8*ones(1,N); end %# of genes(bits) for each variable 

nargin < 5, Np = 10; end %population size(number of chromosomes) 

nitialize the population pool 
lb = sum(Nb); 

= x0(:)'; 1 = 1(:)'; u = u(:)'; 

= feval(f,xo); 

r n = 2:Np, X(n,:) = 1 + rand(size(xO)).*(u - 1); end %Eq.(7.1.26) 

= gen_encode(X,Nb,l,u); %Eq.(7.1.27) 

X = gen_decode(P,Nb,l,u); %Eq.(7.1.28) 
for n = 1:Np, fX(n) = feval(f,X(n,:)); end 
[fxb.nb] = min(fX); %Selection of the fittest 
if fxb < fo, fo = fxb; xo = X(nb,:); end 

fXI = max(fxs) - fX; %make the nonnegative fitness vector by Eq.(7.1.2 
fXm = fXI(nb); 

if fXm < eps, return; end %terminate if all the chromosomes are equal 
%Reproduction of next generation 
for n = 1:Np 

X(n,:) = X(n,:) + eta*(fXm - fXI(n))/fXm*(X(nb,:) - X(n,:)); %Eq.(7 
end 

P = gen_encode(X,Nb,l,u); 

%Mating/Crossover 
is = shuffle([1:Np]); 


if rand < Pc, P(is(n:n + 1),:) = crossover(P(is(n:n + 1),:),Nb); enc 
%Mutation 

P = mutation(P,Nb,Pm); 


notion P = gen_encode(X,Nb,l,u) 

encode a population(X) of state into an array(P) of binary strings 
i=size(X,1); %population size 
= length(Nb); %dimension of the variable(state) 
for n = 1:Np 
b2 = 0; 


bl = b2+1; b2 = b2 + Nb(m); 

Xnm =(2-Nb(m)- 1)*(X<n,m) - l(m))/(u(m) - l(m)); %Eq.(7.1.27) 
P(n,b1:b2) = dec2bin(Xnm,Nb(m)); %encoding to binary strings 


function X = gen_decode(P,Nb,l,u) 

% decode an array of binary strings(P) into a population(X) of state 
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function chrms2 = crossover(chrms2,Nb) 

% crossover between two chromosomes 
Nbb = length(Nb); 
b2 = 0; 
for m = 1:Nbb 

bl = b2 + 1; bi = bl + mod(floor(rand*Nb(m)),Nb(m)); b2 = b2 + Nb(m); 

tmp = ohrms2(1,bi:b2); 

chrms2(1,bi:b2) = chrms2(2,bi:b2); 

chrms2(2,bi:b2) = tmp; 


function P = mutation(P,Nb,Pm) % mutation 
Nbb = length(Nb); 
for n = 1:size(P,1) 
b2 = 0; 
for m = 1:Nbb 
if rand < Pm 

bl = b2 + 1; bi = bl + mod(floor(rand*Nb(m)),Nb(m)); b2 = b2 + Nb(m); 
P(n,bi) = —P(n,bi); 

end _ 

function is = shuffle(is) % shuffle 
N = length(is); 

is(in) = is(n); is(n) = tmp; %swap the n-th element with the in-th one 


7.2 CONSTRAINED OPTIMIZATION [L-2, CHAPTER 10] 

In this section, only the concept of constrained optimization is introduced. The 
explanation for the usage of the corresponding MATLAB routines is postponed 
until the next section. 


7.2.1 Lagrange Multiplier Method 

A class of common optimization problems subject to equality constraints may 
be nicely handled by the Lagrange multiplier method. Consider an optimization 
problem with M equality constraints. 


Min /(x) 

hi(x) 
^2(x) 


(7.2.1a) 


(7.2.1b) 


According to the Lagrange multiplier method, this problem can be converted 
to the following unconstrained optimization problem: 


M 


Min Z(x, X) = /(x) + L r h(x) = /(x) + ^ X m h m (x) 


(7.2.2) 
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The solution of this problem, if it exists, can be obtained by setting the derivatives 
of this new objective function / (x, X) with respect to x and X. to zero: 

A/( X , A) = A/( X ) + A r -^h(x) = V/(x) + J2 ^mVh m (x) = 0 (7.2.3a) 

i-Z(x,l) = h(x) = 0 (7.2.3b) 

dA 

Note that the solutions for this system of equations are the extrema of the objec¬ 
tive function. We may know if they are minima/maxima, from the positive/nega- 
tive-definiteness of the second derivative (Hessian matrix) of l (x, A) with respect 
to x. Let us see the following examples. 

Remark 7.2. Inequality Constraints with the Lagrange Multiplier Method. 

Even though the optimization problem involves inequality constraints like 
gj (x) < 0, we can convert them to equality constraints by introducing the (non¬ 
negative) slack variables yj as 


gj (x) + y] = 0 (7.2.4) 

Then, we can use the Lagrange multiplier method to handle it like an equality- 
constrained problem. 

Example 7.1. Minimization by the Lagrange Multiplier Method. 

Consider the following minimization problem subject to a single equality con¬ 
straint: 


Min /(x) = x 2 +x 2 (E7.1.1a) 

s.t. h(x)=xi+x 2 -2 = 0 (E7.1.1b) 

We can substitute the equality constraint x 2 = 2 — x\ into the objective func¬ 
tion (E7.1.1a) so that this problem becomes an unconstrained optimization prob¬ 
lem as 

Min f(x t ) = x\ + (2- xi) 2 = 2x\ - 4xi + 4 (E7.1.2) 

which can be easily solved by setting the derivative of this new objective function 
with respect to x\ to zero. 

i) = 4x, -4 = 0 , x, = 1, x 2 (E = lb) 2 - xi = 1 (E7.1.3) 

OX\ 

Alternatively, we can apply the Lagrange multiplier method as follows: 

Min /(x, A) (7 = 2) x 2 +xj + X( Xl + x 2 - 2) 


(E7.1.4) 
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———/(x, X) (7 = a) 2 x!+X = 0, 
ox\ 

X! = -X/2 

(E7.1.5a) 

—/(x, X) (7 = 3a) 2 x 2 + A = 0, 

OX 2 

x 2 = -x/2 

(E7.1.5b) 

^Z(x, A) (7 = b) *+*2-2 = 0 
9 A 


(E7.1.5c) 


*1 + X2 (E7 = 5c) 2 (E7 -i^ a ’ b) - x /2 — X /2 = —X = 2 , X = -2 (E7.1.6) 

X! (E7 ^- 5a) -X/2 = 1, x 2 (E7 ^ 5b) -X/2 = 1 (Fig. 7.10) (E7.1.7) 

In this example, the substitution of (linear) equality constraints is more con¬ 
venient than the Lagrange multiplier method. However, it is not always the case, 
as illustrated by the next example. 

Example 7.2. Minimization by the Lagrange Multiplier Method. 

Consider the following minimization problem subject to a single nonlinear 
equality constraint: 


Min/(x) = xi+jc 2 (E7.2.1a) 

s.t. h(x) = + x\ - 2 = 0 (E7.2.1b) 

Noting that it is absurd to substitute the equality constraint (E7.2.1b) into 
the objective function (E7.2.1a), we apply the Lagrange multiplier method as 
below. 


Min /(x, A.) 


(7.2.2) 


X\ + X2 + A(x 7 + x%) 


(E7.2.3) 
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Figure 7.11 The objective function with constraint for Example 7.2. 


A) 
ox 1 

(7.23a) 

1 + 2Xxi 

= 0, X| : 

= -1/2A 

(E7.2.4a) 

9 

(7.23a) 

1 + 2Xx2 : 

= 0, X 2 : 

= -1/2A 

(E7.2.4b) 


(7.23b) 

X? + x\ - 

■2 = 0 


(E7.2.4c) 

(E7.2.4c) ^ (E7.24a,b 

) (—1/2A) 

2 + (—1/2A)" 

! = 2, A = 

±1/2 (E7.2.5) 

(E7.2.4a) 

1/2A = 

= Tl, 

(E7,2.4b) 

X2 = ~ 

-1/2A = =pl 

(E7.2.6) 


Now, in order to tell whether each of these is a minimum or a maximum, 
we should determine the positive/negative-definiteness of the second derivative 
(Hessian matrix) of / (x, A.) with respect to x. 


d 2 l/dx\ 

d 2 l/dx 2 dx\ 


d 2 l/dxidx 2 ~\ _\2X O' 
d 2 l/dx 2 \ ~ [ 0 2A_ 


(E7.2.7) 


This matrix is positive/negative-definite if the sign of A is positive/negative. 
Therefore, the solution (jci, x 2 ) = (— 1, — 1) corresponding to A = 1/2 is a (local) 
minimum that we want to get, while the solution (jci , JC 2 ) = (1,1) corresponding 
to A = —1/2 is a (local) maximum (see Fig. 7.11). 


7.2.2 Penalty Function Method 

This method is practically very useful for dealing with the general constrained 
optimization problems involving equality/inequality constraints. It is really 
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attractive for optimization problems with fuzzy or loose constraints that are not 
so strict with zero tolerance. 

Consider the following problem. 

Min /(x) (7.2.5a) 



■ h\(x) - 



_ Si(x) " 

5.t. h(x) = 


= 0, 

g(x) = 



,h M i*)_ 



_<?/,(*)_ 


The penalty function method consists of two steps. The first step is to construct 
a new objective function 


M L 

Min /(x) = f(x) + u, mh 2 m (x) + v m fig m 00) (7.2.6) 

**= i 

by including the constraint terms in such a way that violating the constraints 
would be penalized through the large value of the constraint terms in the objective 
function, while satisfying the constraints would not affect the objective function. 
The second step is to minimize the new objective function with no constraints 
by using the method that is applicable to unconstrained optimization problems, 
but a non-gradient-based approach like the Nelder method. Why don’t we use 
a gradient-based optimization method? Because the inequality constraint terms 
Vmfmigm (x)) attached to the objective function are often determined to be zero as 
long as x stays inside the (permissible) region satisfying the corresponding con¬ 
straint ig m ix) < 0) and to increase very steeply (like f, n (g m (x)) = cxp(e m g m (x)) 
as x goes out of the region; consequently, the gradient of the new objective func¬ 
tion may not carry useful information about the direction along which the value 
of the objective function decreases. 

From an application point of view, it might be a good feature of this method 
that we can make the weighting coefficient iw m ,v m , and e m ) on each penalizing 
constraint term either large or small depending on how strictly it should be 
satisfied. 

Let us see the following example. 


Example 7.3. Minimization by the Penalty Function Method. 

Consider the following minimization problem subject to several nonlinear 
inequality constraints: 

Min fix) = {(xi + 1.5) 2 + 5 (x 2 - 1.7) 2 }{(xi - 1.4) 2 + 0.6 (jc 2 - 0.5) 2 } 


(E7.3.1a) 
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-Xl 


"O' 

~X2 


0 

3xi — X1X2 + 4x 2 — 7 

< 

0 

2x] + x 2 — 3 


0 

3x\ - 4x 2 - 4x 2 


_ 0 _ 


According to the penalty function method, we construct a new objective func¬ 
tion (7.2.6) as 


Min /(x) = {(jci + 1.5) 2 + 5(x 2 - 1.7) 2 }{(jd - 1.4) 2 + 0.6(x 2 - 0.5) 2 } 

5 

+ 2>m^»(*»(x)) (E7.3.2a) 


where 


v m = 1, 


iMSmto) 


0 

exp(e m g m (x)) 


if g m (x) < 0 (constraint satisfied) 
if g m (x) > 0 (constraint violated) 


e m = 1 V m = 1, ...,5 


(E7.3.2b) 


%nm722 for Ex.7.3 

% to solve a constrained optimization problem by penalty ftn method, 
clear, elf 
f = 1 f722p 1 ; 
x0=[0.4 0.5] 

TolX = 1e-4; TolFun = 1e-9; alphaO = 1; 

[xo_Nelder,fo_Nelder] = opt_Nelder(f,x0) %Nelder method 
[fc_Nelder,fo_Nelder,co_Nelder] = f722p(xo_Nelder) %its results 
[xo_s,fo_s] = fminsearch(f,x0) %MATLAB built-in fminsearch() 
[fc_s,fo_s,co_s] = f722p(xo_s) %its results 
% including how the constraints are satisfied or violated 
xo_steep = opt_steep(f,xO,TolX,TolFun,alphaO) %steepest descent method 
[fc_steep,fo_steep,co_steep] = f722p(xo_steep) %its results 
[xo_u,fo_u] = fminuncff,x0); % MATLAB built-in fminunc() 
[fc_u,fo_u,co_u] = f722p(xo_u) %its results 


function [fc,f,c] = f722p(x) 

f=((x(1)+ 1.5) A 2 + 5*(x(2)- 1.7)~2)*((x(1)- 1.4j*g + .6*(x(2)-.5)"2); 
c=[-x(1); -x(2); 3*x(1) - x(1)*x(2) + 4*x(2) - 7; 

2*x(1)+ x(2) - 3; 3*x(1) - 4*x(2)"2 - 4*x(2)]; %constraint vector 
v=[1 1 1 1 1]; e = [1 1 1 1 1] 1 ; %weighting coefficient vector 
fc = f +v*((c > 0).*exp(e.*c)); %new objective function _ 
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xoNelder = 1.2118 0.5765 

foNelder = 0.5322 %min value 

co_Nelder = -1.2118 
-0.5765 

-1.7573 %high margin 
-0.0000 %no margin 
-0.0000 %no margin 
xo_S= 1.2118 0.5765 

fo_s = 0.5322 %min value 


xo_Steep = 1.2768 0.5989 

fosteep = 0.2899 %not a minimum 

co_steep = -1.2768 
-0.5989 
-1.5386 

0.1525 %violating 
-0.0001 

Warning: .. Gradient must be provided 


Maximum # of function evaluations 
exceeded; 

xo_u = 1.2843 0.6015 

fo_u = 0.2696 %not a minium 


Note that the shape of the penalty function as well as the values of the 
weighting coefficients is set by the users to cope with their own problems. Then, 
we apply an unconstrained optimization technique like the Nelder-Mead method, 
which is not a gradient-based approach. Here, we make the program “nm722. m”, 
which applies not only the routine “opt_Nelder()” and the MATLAB built-in 
routine “fminsearch()” for cross-check, but also the routine “opt_steep()” and 
the MATLAB built-in routine “fminuncO” in order to show that the gradient- 
based methods do not work well. To our expectation, the running results listed 
above and depicted in Fig. 7.12 show that, for the objective function (E7.3.2a) 
augmented with the penalized constraint terms, the gradient-based routines 
“opt_steep()” and “fminuncO” are not so effective as the non-gradient- 
based routines “opt_Nelder()” and “fminsearch()” in finding the constrained 



Figure 7.12 The contours for the objective function (E7.3.la) and the admissible region 
satisfying the inequality constraints. 
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minimum, which is on the intersection of the two boundary curves corresponding 
to the fourth and fifth constraints of (E7.3.1b). 


7.3 MATLAB BUILT-IN ROUTINES FOR OPTIMIZATION 

In this section, we apply several MATLAB built-in unconstrained optimization rou¬ 
tines including “f minsearch () ” and “f minunc () ” to the same problem, expecting 
that their nuances will be clarified. Our intention is not to compare or evaluate the 
performances of these sophisticated routines, but rather to give the readers some 
feelings for their functional differences. We also introduce the routine “linprog () ” 
implementing Linear Programming (LP) scheme and “fmincon()” designed for 
attacking the (most challenging) constrained optimization problems. Interested 
readers are encouraged to run the tutorial routines “optdemo” or “tutdemo”, which 
demonstrate the usages and performances of the representative built-in optimiza¬ 
tion routines such as “f minunc ()” and “fminconf)”. 

7.3.1 Unconstrained Optimization 

In order to try applying the unconstrained optimization routines introduced 
in Section 7.1 and see how they work, we made the MATLAB program 
“nm731_1 .m”, which uses those routines for solving the problem 

Min f(x) = (xi - 0.5) 2 (xj + l) 2 + (x 2 + l) 2 (x 2 - l) 2 (7.3.1) 

where the contours and the (local) maximum/minimum/saddle points of this 
objective function are depicted in Fig. 7.13. 



+ minimum 
^ maximum 
□ saddle 


-0- steepest descent 
—Newton 


Figure 7.13 The contours, minima, maxima, and saddle points of the objective function (7.3.1). 
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%nm731_1 

% to minimize an objective function f(x) by various methods, 
clear, elf 

% An objective function and its gradient function 

f = inlinef'(x(1) - 0.5)."2.*(x(1) + 1).'2 + (x(2)+1).'2.*(x(2) - 1).*2','x'); 
go - [2*(x(1)- 0.5)*(x(1)+ 1)*(2*x(1)+ 0.5) 4*(x(2)^2 - 1).*x(2)]'; 

g = inline(g0, 1 x 1 ); 
xO = [0 0.5] %initial guess 

[xon,fon] = opt_Nelder(f,x0) %min point, its ftn value by optNelder 
[xos,fos] = fminsearch(f,x0) %min point, its ftn value by fminsearch() 
[xost,fost] = opt_steep(f,x0) %min point, its ftn value by opt_steep() 

TolX = 1e-4; Maxlter = 100; 
xont = Newtons(g,xO,TolX,Maxlter); 

xont,f(xont) %minimum point and its function value by Newtonsj) 

[xocg,focg] = opt_conjg(f,x0) %min point, its ftn value by opt_conjg() 
[xou,fou] = fminuncff,x0) %min point, its ftn value by fminunc() 


Noting that it depends mainly on the initial value x 0 whether each routine 
succeeds in finding a minimum point, we summarize the results of running those 
routines with various initial values in Table 7.2. It can be seen from this table 
that the gradient-based optimization routines like “opt_steep ()”, “Newtons () ”, 
“opt_conj ()”, and “fminuncO” sometimes get to a saddle point or even a 
maximum point (Remark 7.1) and that the routines do not always approach the 
extremum that is closest to the initial point. It is interesting to note that even 
the non-gradient-based MATLAB built-in routine “fminsearch ()” may get lost, 
while our routine “opt_Nelder()” works well for this case. We cannot, how¬ 
ever, conclude that this routine is better than that one based on only one trial, 
because there may be some problems for which the MATLAB built-in routine 
works well, but our routine does not. What we can state over this happening is 
that no human work is free from defect. 

Now, we will see a MATLAB built-in routine “lsqnonlinff,x0,l,u, 
options,pi, ..)”, which presents a nonlinear least-squares (NLLS) solution to 


Table 7.2 Results of Running Several Unconstrained Optimization Routines with 
Various Initial Values 


X0 

opt_Nelder 

fminsearch 

opt_steep 

Newtons 

opt_conjg 

fminunc 

[0, 0] 

[-L 1] 

[0.5, 1] 

[0.5, 0] 

[-0.25, 0] 

[0.5, 0] 

[0.5, 0] 


(minimum) 

(minimum) 

(saddle) 

(maximum) 

(saddle) 

(saddle) 

[0, 0.5] 

[0.5, 1] 

[0.02, 1] 

[0.5, 1] 

[-0.25, -1] 

[0.5, 1] 

[0.5, 1] 


(minimum) 

(lost) 

(minimum) 

(saddle) 

(minimum) (minimum) 

[0.4, 0.5] 

[0.5, 1] 

[0.5, 1] 

[0.5, 1] 

[0.5, -1] 

[0.5, 1] 

[0.5, 1] 


(minimum) 

(minimum) 

(minimum) 

(minimum) 

(minimum) (minimum) 

[-0.5, 0.5] 

[0.5, 1] 

[-L 1] 

[-L 1] 

[-0.25, -1] 

[-1. 1] 

[-L 1] 


(minimum) 

(minimum) 

(minimum) 

(saddle) 

(minimum) (minimum) 

[-0.8, 0.5] 

[-L 1] 

[-L 1] 

[-L 1] 

[—L -1] 

[-L 1] 

[-L 1] 


(minimum) 

(minimum) 

(minimum) 

(minimum) 

(minimum) (minimum) 
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the minimization problem 


N 

Min />) 


(7.3.2) 


The routine needs at least the vector or matrix function f(x) and the initial guess 
xo as its first and second input arguments, where the components of f(x) = 
[/i(x) • ■ • /v(x)] T are squared, summed, and then minimized over x. In order to 
learn the usage and function of this routine, we made the MATLAB program 
“nm731_2.m”, which uses it to find a second-degree polynomial approximating 
the function 


y = m = 


(7.3.3) 


For verification, the result of using the NLLS routine “lsqnonlin ()” is compared 
with that obtained from directly applying the routine “polyfitsO” introduced 
in Section 3.8.2. 


» nm731_2 

aolsq = [-0.1631 -0.0000 0.4653], ao_fit = [-0.1631 -0.0000 0.4653] 


%nm731_2 try using lsqnonlin() for a vector-valued objective ftn F(x) 
clear, elf 

N = 3; aO = zeros(1,N); %the initial guess of polynomial coefficient vector 
ao_lsq = lsqnonlin)'f731_2',a0) %parameter estimate by lsqnonlin() 
xx = -2+[0:400]/50; fx = 1 ./(1+8*xx.*xx); 

aofit = polyfits(xx,fx,N - 1) %parameter estimate by polyfitsO 


function F = f731_2(a) 

%error between the polynomial a(x) and f(x) = 1/(1+8x^2) 
XX = -2 +[0:200]/50; F = polyval(a,xx) - 1 ./(1+8*xx.*xx); 


7.3.2 Constrained Optimization 

Generally, constrained optimization is very complicated and difficult to deal with. 
So we will not cover the topic in details here and instead, will just introduce the 
powerful MATLAB built-in routine “fmincon()”, which makes us relieved from 
a big headache. 

This routine is well-designed for attacking the optimization problems subject 
to some constraints: 


function [c,ceq] = f722c(x) 

C = [ -x(1); -x(2); 3*x(1) - x(1)*x(2) + 4*x(2)- 7; 

2*x(1)+ x(2)- 3; 3*x(1)- 4*x(2) A 2 - 4*x(2)]; %inequality constraints 
ceq = []; %equality constraints 





MATLAB BUILT-IN ROUTINES FOR OPTIMIZATION 353 


(Usage of the MATLAB 6.x built-in function “fmincon()”) 

[xo,fo,.] = fmincon('ftn',xO,A,b,Aeq,beq,l,u,'nlcon',options,pi,p2,.) 

• Input Arguments (at least four input arguments 'ftn 1 , xO, A and b required) 

'ftn 1 : an objective function /(x) to be minimized, usually defined in an 

M-file, but can be defined as an inline function, which will 
remove the necessity of quotes(' '). 

xO : an initial guess x 0 of the solution 

A, b : a linear inequality constraints Ax < b; to be given as [ ] if not 

applied. 

Aeq,beq: a linear equality constraints A eq x = b eq ; to be given as [] if not 
applied. 

1, u : lower/upper bound vectors such that 1 < x < u; to be given as [ ] 
if not applied, set 1 (i) = -inf/u(i) = inf if x(i) is not 
bounded below/above. 

' nlcon': a nonlinear constraint function defined in an M-file, supposed to 
return the two output arguments for a given x; the first one being 
the LHS (vector) of inequality constraints c(x) < 0 and the 
second one being the LHS (vector) of equality constraints 
c eq (x) = 0; to be given as [ ] if not applied. 

options: used for setting the display parameter, the tolerances for x G and 
/(x 0 ), and so on; to be given as [ ] if not applied. For details, 
type ‘help optimset’ into the MATLAB command window. 

pi, p2,.: the problem-dependent parameters to be passed to the objective 
function /(x) and the nonlinear constraint functions c(x), c eq (x). 

• Output Arguments 

xo : the minimum point (x 0 ) reached in the permissible region 
satisfying the constraints 

fo : the minimized function value /(x 0 ) 


%nm732_1 to solve a constrained optimization problem by fmincon() 
clear, elf 

ftn= ((x(1) + 1.5) A 2 + 5*(x(2) - 1.7) A 2)*((x(1)-1. 4) A 2 + -6*(x(2)-.5)^2) 
f722o = inline(ftn, 1 x 1 ); 
xO = [0 0.5] %initial guess 

A = []; B = []; Aeq = []; Beq = []; %no linear constraints 
1 = -inf*ones(size(xO)); u = inf*ones(size(xO)); % no lower/upperbound 
options = optimset( 1 LargeScale','off'); %just [] is OK. 

[xo_con,fo_con] = fmincon(f722o,xO,A,B,Aeq,Beq,l,u,'f722c',options) 
[co,ceqo] = f722c(xo_con) % to see how constraints are. 
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Min /(x) (7.3.4) 

s.t. Ax < b, A eq x = b eq , c(x) < 0, c eq (x) = 0 and 1 < x < u (7.3.5) 


A part of its usage can be seen by typing ‘help fmincon’ into the MATLAB 
command window as summarized in the above box. We make the MATLAB 
program “nm732_1. m”, which uses the routine “fmincon () ” to solve the problem 
presented in Example 7.3. Interested readers are welcomed to run it and observe 
the result to check if it agrees with that of Example 7.3. 

There are two more MATLAB built-in routines to be introduced in this section. 
One is 

"fminimax('ftn',xO,A,b,Aeq,beq,1,u, 1 nlcon 1 ,options,pi 

which is focused on minimizing the maximum among several components of 
the vector/matrix-valued objective function f(x) = [/i(x) • • • /v(x)] r subject to 
some constraints as described below. Its usage is almost the same as that of 
“fmincon()”. 

Min{Max{/„ (x)}} (7.3.6) 

s.t. Ax < b, A eq x = b eq , c(x) < 0, c^(x) = 0, and 1 < x < u (7.3.7) 
The other is the constrained linear least-squares (LLS) routine 
"lsqlin(C,d,A,b,Aeq,beq,l,u,x0,options,p1 
whose job is to solve the problem 

Min ||Cx — d|| 2 (7.3.8) 

s.t. Ax < b, A eq x = b eq and 1 < x < u (7.3.9) 

In order to learn the usage and function of this routine, we make the MATLAB 
program “nm732_2.m”, which uses both “fminimax()” and “lsqlin()” to find 
a second-degree polynomial approximating the function (7.3.3) and compares 
the results with that of applying the routine “lsqnonlin()” introduced in the 
previous section for verification. From the plotting result depicted in Fig. 7.14, 
note the following. 

• We attached no constraints to the “fminimax()” routine, so it yielded the 
approximate polynomial curve minimizing the maximum deviation from 
fix). 

• We attached no constraints to the constrained linear least-squares routine 
“lsqlin()” either, so it yielded the approximate polynomial curve 
minimizing the sum (integral) of squared deviation from fix), which is 
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Figure 7.14 Approximation of a curve by a second-degree polynomial function based on the 
minimax, least-squares, and Chebyshev methods. 

the same as the (unconstrained) least squares solution obtained by using the 
routine “lsqnonlin()”. 

• Another MATLAB built-in routine “lsqnonnegO” gives us a nonnegative 
LS (NLS) solution to the problem (7.3.8). 


%nm732_2: uses fminimax() for a vector-valued objective ftn f(x) 
clear, elf 

f = inline('1-/(1+8*x.*x)','x'); 

f73221 = inline('abs(polyval(a,x) - fx)','a 1 ,'x','fx'); 
f73222 = inline)'polyval(a,x) - fx', 1 a 1 , 1 x 1 , 1 fx'); 

N = 2; % the degree of approximating polynomial 

aO = zeros(1,N +1); %initial guess of polynomial coefficients 

xx = -2+[0:200] 1 /50; %intermediate points 

fx = feval(f,xx); % and their function values f(xx) 

ao_m a fminimax(f73221,aO,[],[],[],[],[],[],[],[],xx,fx) %fminimax sol 

for n = 1:N+1, C(:,n) = xx.~(N + 1 - n); end 

ao_ll = lsqlin(C,fx) %linear LS to minimize (Ca - fx)'2 with no constraint 

aoln = lsqnonlin(f73222,aO,[],[],[],xx,fx) %nonlinear LS 

c2 = cheby(f,N,-2,2) %Chebyshev polynomial over [-2,2] 

plot(xx,fx,':', xx,polyval(ao_m,xx),'m', xx,polyval(ao_ll,xx), 1 r 1 ) 

hold on, plot(xx,polyval(ao_ln,xx), 1 b', xx,polyval(c2,xx), 1 --') 

axis([-2 2 -0.4 1.1]) 


7.3.3 Linear Programming (LP) 

The linear programming (LP) scheme implemented by the MATLAB built-in 
routine 

”[xo,fo] = linprog(f,A,b,Aeq,Beq,l,u,x0,options)" 

is designed to solve an LP problem, which is a constrained minimization problem 
as follows. 


Min /(x) = f T x 

subject to Ax < b, A eq x = b eq . and l<x<u 


(7.3.10a) 

(7.3.10b) 
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%nm733 to solve a Linear Programming problem. 

% Min f*x=-3*x(1)-2*x(2) s.t. Ax <= b, Aeq = beq and 1 <= x <= u 
xO = [00]; %initial point 

f = [-3 -2]; %the coefficient vector of the objective function 
A = [34; 21]; b = [7; 3]; %the inequality constraint Ax <= b 
Aeq = [-32]; beq = 2; %the equality constraint Aeq*x = beq 
1 = [0 0]; u = [10 10]; %lower/upper bound 1 <= x <= u 
[xo_lp,fo_lp] = linprog(f,A,b,Aeq,beq,l,u) 

cons_satisfied = [A; Aeq]*xo_lp-[b; beq] %how constraints are satisfied 
f733o=inline('-3*x(1)-2*x(2)', 'x'); 

[xo_con,fo_con] = fmincon (f733o, xO, A, b, Aeq, beq, 1, u) 


It produces the solution (column) vector x 0 and the minimized value of the 
objective function /(x 0 ) as its first and second output arguments xo and fo, 
where the objective function and the constraints excluding the constant term are 
linear in terms of the independent (decision) variables. It works for such linear 
optimization problems as (7.3.10) more efficiently than the general constrained 
optimization routine “fmincon ()”. 

The usage of the routine “linprogO” is exemplified by the MATLAB pro¬ 
gram “nm733. m”, which uses the routine for solving an LP problem described as 

Min f(x) = fx = [-3 - 2][jci x 2 f = -3*1 - lx 2 (7.3.11a) 



and 


(7.3.11b) 



Figure 7.15 The objective function, constraints, and solutions of an LP problem. 
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Table 7.3 The Names of MATLAB Built-In Minimization Routines in MATLAB 5.x/6.x 



Unconstrained Minimization 

Constrained Minimization 


Minimization 


Non-Gradient- Gradient- 


Linear 

Nonlinear 


Methods 

Bracketing 

Based Based Linear 

Nonlinear 

LS 

LS 

Minimax 

MATLAB 5.x 

fmin 

fmins fminu lp 

constr 

leastsq 

conls 

minimax 

MATLAB 6.x 

fminbnd 

fminsearch fminunc linprog 

fmincon 

lsqnonlin 

lsqlin 

fminimax 


The program also applies the general constrained minimization routine “fmin- 
con()” to solve the same problem for cross-check. Readers are welcome to run 
the program and see the results. 

» nm733 

xo_lp = [0.3333 1.5000], fO_lp = -4.0000 

cons_satisfied = -0.0000 % <= O(inequality) 

-0.8333 % <= O(inequality) 

-0.0000 % = O(equality) 
xo_con = [0.3333 1.5000], fo_con = -4.0000 

In this result, the solutions obtained by using the two routines “linprogO” and 
“fminconf)” agree with each other, satisfying the inequality/equality constraints 
and it can be assured by Fig. 7.15. 

In Table 7.3, the names of MATLAB built-in minimization routines in MAT¬ 
LAB version 5.x and 6.x are listed. 


PROBLEMS 

7.1 Modification of Golden Search Method 

In fact, the golden search method explained in Section 7.1 requires only 
one function evaluation per iteration, since one point of a new interval 
coincides with a point of the previous interval so that only one trial point 
is updated. In spite of this fact, the MATLAB routine “opt_gs()” imple¬ 
menting the method performs the function evaluations twice per iteration. 
An improvement may be initiated by modifying the declaration type as 

[xo,fo] = opt_gsl(f,a,e,fe,rl,b,r,Tolx,TolFun,k) 

so that anyone could use the new routine as in the following program, 
where its input argument list contains another point (e) as well as the new 
end point (b) of the next interval, its function value (f e), and a parameter 
(rl) specifying if the point is the left one or the right one. Based on this 
idea, how do you revise the routine “opt_gs()” to cut down the number 
of function evaluations? 
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%nm7p01.m to perform the revised golden search method 
f701 = inline)'x.*(x-2)', 'x'); 
a = 0; b = 3; r = (sqrt(5)-1)/2; 

TolX = 1e-4; TolFun = 1e-4; Maxlter=100; 


c = b - rh; d = a + rh; 
fc = f701(c); fd = f701(d); 
if fc < fd, [xo,fo] = optgsl(f701,a,c,fc,1 
else [xo,fo] = opt_gs1(f701,c,d,fd,r,b,r, 


1 - r,d,r,TolX,TolFun,Maxlter) 
TolX,TolFun,Maxlter) 


7.2 Nelder-Mead, Steepest Descent, Newton, SA, GA and fminunc(), fmin- 
search() 

Consider a two-variable objective function 

fix) =Xf - I2x\ - 4*i + x\ - 16*2 - 5x2 (P7.2.1) 

— 20cos(*i — 2.5) cos (*2 — 2.9) 


whose gradient vector function is 


g(x) = V/(x) = 


4*j — 24*i — 4 + 20sin(*i — 2.5) cos (*2 — 2.9) 
4*| - 32*2 - 5 + 20cos(*i - 2.5) sin(* 2 - 2.9) 


(P7.2.2) 

You have the MATLAB functions f7p02(), g7p02() defining the objective 
function /(x) and its gradient function g(x). You also have a part of the 
MATLAB program which plots a mesh/contour-type graphs for /(x). Note 
that this gradient function has nine zeros as listed in Table P7.2.1. 


Table P7.2.1 Extrema (Maxima/Minima) and Saddle Points of the Function (P7.2.1) 


Points Signs of d 2 f/dxf 


Points Signs of d 2 f/dxf 


(1) [0.6965 -0.1423] 

M 

(6) [-1.6926 -0.1183] 


(2) [2.5463 -0.1896] 


(7) [-2.6573 -2.8219] +, + 

m 

(3) [2.5209 2.9027] +, + 

G 

(8) [-0.3227 -2.4257] 


(4) [-0.3865 2.9049] 


(9) [2.5216 -2.8946] +, + 

m 

(5) [-2.6964 2.9031] 





(a) From the graphs (including Fig. P7.2) which you get by running the 
(unfinished) program, determine the characteristic of each of the nine 
points, that is, whether it is a local maximum(M)/minimum(m), the 
global minimum(G) or a saddle point(S) which is a minimum with 
respect to one variable and a maximum with respect to another variable. 
Support your judgment by telling the signs of the second derivatives of 
f(x) with respect to x\ and * 2 . 
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Figure P7.2 The contour, extrema and saddle points of the objective function (P7.2.1). 



d 2 f/dx 2 = 12x? - 24 + 20 cosC*! - 2.5) cos(jc 2 - 2.9) 

(P7.2.3) 

d 2 f/dxl = \2xl - 32 + 20cos(^i - 2.5) cos(jc 2 - 2.9) 

(b) Apply the Nelder-Mead method, the steepest descent method, the New¬ 
ton method, the simulated annealing (SA), genetic algorithm (GA), and 
the MATLAB built-in routines f minunc (), f minsearch () to minimize 
the objective function (P7.2.1) and fill in Table P7.2.2 with the number 
and character of the point reached by each method. 
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Table P7.2.2 Points Reached by Several Optimization Routines 


Initial Point 

Reached Point | 

xo 

Nelder 

Steepest 

Newton 

fminunc 

fminsearch 

SA 

GA 

(0,0) 

(5)/m 







(1,0) 


(3)/G 






(1,1) 



(9)/m 





(0,1) 




(3)/G 




(-1.1) 





(5)/m 



(—1,0) 






~(3)/G 


(-1. -1) 







(3)/G 

(0, -1) 

(9)/m 







(1,-1) 


(9)/m 






(2,2) 



(3)/G 





(-2, -2) 




(7)/m 





(c) Overall, the point reached by each minimization algorithm depends on 
the starting point—that is, the initial value of the independent variable 
as well as the characteristic of the algorithm. Fill in the blanks in 
the following sentences. Most algorithms succeed to find the global 
minimum if only they start from the initial point (,),(,),(,), or (, ). 
An algorithm most possibly goes to the closest local minimum (5) if 
launched from ( , ) or ( , ), and it may go to the closest local minimum 
(7) if launched from ( , ) or ( , ). If launched from ( , ), it may go to 
one of the two closest local minima (7) and (9) and if launched from 
( , ), it most possibly goes to the closest local minimum (9). But, the 
global optimization techniques SA and GA seem to work fine almost 
regardless of the starting point, although not always. 

7.3 Minimization of an Objective Function Having Many Local Minima/ 
Maxima 

Consider the problem of minimizing the following objective function 

Min f(x) = sin(l/jc)/((jc - 0.2) 2 + 0.1) (P7.3.1) 

which is depicted in Fig. P7.3. The graph shows that this function has 
infinitely many local minima/maxima around x = 0 and the global mini¬ 
mum about x = 0.2. 

(a) Find the solution by using the MATLAB built-in routine “fminbnd ()”. 
Is it plausible? 

(b) With nine different values of the initial guess xo = 0.1, 0.2,..., 0.9, use 
the four MATLAB routines “opt_Nelder ()”, “opt_steep()”, “fmin- 
unc()”, and “fminsearch()” to solve the problem. Among those 36 
tryouts, how many times have you got the right solution? 
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Figure P7.3 The graph of f(x) = sin(1 /x)/((x - 0.2) 2 + 0.1) having many local minima/maxima. 


(c) With the values of the parameters set to l = 0 ,u = 1, q = 1, £/ = 10~ 9 , 
&max = 1000 and the initial guess x 0 = 0.1,0.2,..., 0.9, use the SA 
(simulated annealing) routine “sim_anl()” to solve the problem. You 
can test the performance of the routine and your luck by running the 
routine four times for the same problem and finding the probability of 
getting the right solution. 

(d) With the values of the parameters set to Z = 0, u = 1, N p = 30, Nb = 
12, P c = 0.5, P m = 0.01, rj = 1, k max = 1000 and the initial guess xq = 
0.1, 0.2,..., 0.9, use the GA (genetic algorithm) routine “genetic))” 
to solve the problem. As in (c), you can run the routine four times for 
the same problem and find the probability of getting the right solution 
in order to test the performance of the routine and your luck. 

7.4 Linear Programming Method 

Consider the problem of maximizing a linear objective function 

Max /(x) =f 7 x= [3 2 -1 ][jc, x 2 x 3 f (P7.4.1a) 


subject to the constraints 
' 3 -2 


■ 

X\ 


"10" 

< X = 

*2 

< 

10 

_ 

_ X 3 _ 


10 


Jessica is puzzled with this problem, which is not a minimization but a 
maximization. How do you suggest her to solve it? Make the program that 
uses the MATLAB built-in routines “linprog ()” and “fmincon ()” to solve 
this problem and run it to get the solutions. 
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7.5 Constrained Optimization and Penalty Method 

Consider the problem of minimizing a nonlinear objective function 


Min x /(x) = —3x1 - 2 x 2 + M{ 3xi - 2x 2 + 2) 2 (P7.5.1a) 

(M : a large positive number) 


subject to the constraints 

[.I-i - ■-[:]—[;] 



(P7.5. lb) 


(a) With the two values of the weighting factor M = 20 and 10,000 in 
the objective function (P7.5.1a), apply the MATLAB built-in routine 
“fminconO” to find the solutions to the above constrained minimiza¬ 
tion problem. In order to do this job, you might have to make the vari¬ 
able parameter M passed to the objective function (defined in an M-file) 
either through “fminconO” or directly by declaring the parameter as 
global both in the main program and in the M-file defining (P7.5.1a). In 
case you are going to have the parameter passed through “fminconO” 
to the objective function, you should have the parameter included in 
the input argument list of the objective function as 


function f=f7p05M(x,M) 

f = -3*x(1)-2*x(2)+M*(3*x(1)-2*x(2)+2)."2; 


Additionally, you should give empty matrices ([ ]) as the ninth input 
argument (for a nonlinear inequality/equality constraint function ‘noni¬ 
con’) as well as the 10th one (for ‘options’) and the value of M as 
the 11th one of the routine “fminconO”. 

xo = fmincon('f7p05M',x0, A,b,[],[],l,u,[],[], M) 

For reference, type ‘help fmincon’ into the MATLAB command 
window. 

(b) Noting that the third (squared) term of the objective function (P7.5.1a) 
has its minimum value of zero for 3xi — 2x 2 + 2 = 0 and, thus, it actu¬ 
ally represents the penalty (Section 7.2.2) imposed for not satisfying the 
equality constraint 

3X! - 2x 2 + 2 = 0 (P7.5.2) 

tell which of the solutions obtained in (a) is more likely to satisfy this 
constraint and support your answer by comparing the values of the 
left-hand side of this equality for the two solutions. 
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(c) Removing the third term from the objective function and splitting the 
equality constraint into two reversed inequality constraints, we can 
modify the problem as follows: 

Min x /(x) = — 3jci - 2x 2 (P7.5.3a) 

subject to the constraints 

1 :! [s] S “ d <p7 - 53b) 

3 —2 J > —2 



Noting that this fits the linear programming, apply the routine “lin- 
prog()” to solve this problem. 

(d) Treating the equality constraint separately from the inequality con¬ 
straints, we can modify the problem as follows: 

Min x /(x) = —3xi - 2x 2 (P7.5.4a) 


subject to the constraints 


3 

3 

-2 



■xii<r io] 

’ . X *\ ~ L 10 J 


(P7.5.4b) 

Apply the two routines “linprogO” and “fmincon()” to solve this 
problem and see if the solutions agree with the solution obtained in (c). 


(cf) Note that, in comparison with the routine “f mincon () ”, which can solve a gen¬ 
eral nonlinear optimization problem, the routine “linprogO” is made solely 
for dealing with a class of optimization problems having a linear objective 
function with linear constraints. 


7.6 Nonnegative Constrained LS and Constrained Optimization 

Consider the problem of minimizing a nonlinear objective function 

Min x ||Cx - d|| 2 = [Cx - df[Cx - d] (P7.6.1a) 


subject to the constraints 


x=r*‘Mnl=l 

x 2 J [0 



'1 2' 


" 5.r 

c = 

3 4 

d = 

10.8 


5 1 


6.8 


(P7.6.1b) 


(P7.6.1c) 



364 OPTIMIZATION 


(a) Noting that this problem has no other constraints than the lower bound, 
apply the constrained linear least-squares routine “lsqlin()” to find 
the solution. 

(b) Noting that the lower bounds for all the variables are zeros, apply the 
MATLAB built-in routine “lsqnonnegO” to find the solution. 

(c) Apply the general-purpose constrained optimization routine “f mincon () ” 
to find the solution. 

7.7 Constrained Optimization Problems 

Solve the following constrained optimization problems by using the MAT- 

LAB built-in routine “fmincon()”. 

(a) Min x x\ - 5xj + 6xt + x\ — 2x 2 + x 3 (P7.7.la) 

subject to the constraints 

A + x\ - x 3 < 0 
x\ + x\ + jcf > 6 and x = 
x 3 < 5 

Try the routine “fmincon()” with the initial guesses listed in 
Table P7.7. 


Table P7.7 The Results of Applying “f mincon () ” with Different Initial Guess 



Xi 


"0" 

X 2 

> 

0 

*3 


0 
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(bl) Max x x 1 X 2 X 3 

subject to the constraints 


(P7.7.2a) 


X 1 X 2 + X 2 X 3 + X 3 X 1 = 3 and 

Try the routine “fmincon()” 
Table P7.7. 


x\ 0 

x = x 2 > 0 

X3 0 

with the initial guesses 


(P7.7.2b) 


listed in 


(b2) Min x x,x 2 x 3 (P7.7.3a) 

subject to the constraints (P7.7.2b). 

Try the routine “fmincon()” with the initial guesses listed in 
Table P7.7. 


(cl) Max x X]X 2 + x 2 x 3 + X 3 X 1 


(P7.7.4a) 


subject to the constraints 




and 



'O' 

0 

0 


(P7.7.4b) 


Try the routine “fmincon()” with the initial guesses listed in 
Table P7.7. 


(c2) Min x X]X 2 + x 2 x 3 + x 3 x\ 


(P7.7.5a) 


subject to the constraints (P7.7.4b). 

Try the routine “fmincon()” with the initial guesses listed in 
Table P7.7. 


(d) 


Min x 


10000 

x x x\ 


subject to the constraints 


x\ + x\ = 100 and x = Xl 

l X2 . 


(P7.7.6a) 


(P7.7.6b) 


Try the routine “fmincon()” with the initial guesses listed in 
Table P7.7. 

(e) Does the routine work well with all the initial guesses? If not, does it 
matter whether the starting point is inside the admissible region? 

(cf) Note that, in order to solve the maximization problem by “fmincon()”, we 
have to reverse the sign of the objective function. Note also that the objective 
functions (P7.7.3a) and (P7.7.5a) have infinitely many minima having the value 
/(x) = 0 in the admissible region satisfying the constraints. 
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(cf) One might be disappointed with the reliability of the MATLAB optimization 
routines to see that they may fail to find the optimal solution depending on the 
initial guess. But, how can a human work be perfect in this world? It implies 
the difficulty of nonlinear constrained optimization problems and can never 
impair the celebrity and reliability of MATLAB. Actually, it demonstrates the 
importance of studying some numerical stuff in addition to just getting used 
to the various MATLAB commands and routines. 

Here is a tip for the usage of “fminconO”: it might be better to use with 
an initial guess that is not at the origin, but in the admissible region satisfying 
the constraints, even though it does not guarantee the success of the routine. 
It might also be helpful to apply the routine with several values of the initial 
guess and then choose the best result. 

7.8 Constrained Optimization and Penalty Method 

Consider again the constrained minimization problem having the objective 
function (E7.3.1a) and the constraints (E7.3.1b). 


-X\ 


-o- 

-x 2 


0 

3xi — xix 2 + 4 x 2 — 1 

< 

0 

2x\ + x 2 — 3 


0 

3xi — 4x\ — 4x 2 


_ 0 _ 


(P7.8.1a) 


(P7.8.1b) 


In Example 7.3, we made the MATLAB program “nm722. m” to solve the 
problem and defined the objective function (E7.3.2a) having the penalized 
constraint terms in the file named “f722p.m”. 


Min /(x) = {(*! + 1.5) 2 + 5(x 2 - 1.7) 2 }{(jd - 1.4) 2 + 0.6(x 2 - 0.5) 2 } 

+ J^ n=1 v m f m (g m m (P7.8.2a) 


where 

/ ( ( \\ _ JO ^ g m (x) < 0 (constraint satisfied) 

WmignAV) — j ex p( em g m ( x )) jf g m ( x ) > 0 (constraint viloated) 

with e m = 1 Vm = 1, ...,5 (P7.8.2b) 

(a) Whatis the weighting coefficient vector v in the file named “f722p . m”?Do 
the points reached by the routines “fminsearch( )”/“opt_ 
steep()”/“fminunc( )” satisfy all the constraints so that they are in the 
admissible region? If not, specify the constraint(s) violated by the points. 

(b) Suppose the fourth constraint was violated by the point in (a). Then, 
how would you modify the weighting coefficient vector v so that the 
violated constraint can be paid more respect? Choose one of the fol¬ 
lowing two weighting coefficient vectors: 
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(i) V = [1 1 1 1/31] 

(ii) v = [1 1 13 1] 

and modify the file “f722p. m” with this coefficient vector. Then, run 
the program “nm722.m”, fill in the 22 blanks of Table P7.8 with the 
results and see if the fourth constraint is still violated by the points 
reached by the optimization routines? 

(c) Instead of the penalty method, apply the intrinsically constrained opti¬ 
mization routine “fmincon()” with the initial guesses x 0 = [0.4 0.5] 
and [0.2 4] to solve the problem described by Eq. (E7.3.1) or (P7.8.1) 
and fill in Table P7.8 with the results concerning the reached point and 
the corresponding values of the objective/constraint functions. 

(d) Based on the results listed in Table P7.8, circle the right word in each 
of the parentheses in the following sentences: 

• For penalty methods, the non-gradient-based minimization routines like 
“Nelder( )”/“fminsearch()” may work (better, worse) than the gradient- 
based minimization routines like “opt_steep( )’7“fminunc()”. 

• If some constraint is violated, you had better (increase, decrease) the 
corresponding weight coefficient. 

(cf) Besides, unconstrained optimization with the penalized constraints in the 
objective function sometimes works better than the constrained optimization 
routine “fmincon()”. 

Table P7.8 The Results of Penalty Methods Depending on the Initial Guess and 
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7.9 A Constrained Optimization on Location 

A company has three factories that are located at the points (—16,4), (6,5), 
and (3,-9), respectively, in the xiJC 2 -plane, and the numbers of deliveries 
to those factories are 5, 6, and 10 per month, respectively (Fig. P7.9). The 
company has a plan to build a new warehouse in its site bounded by 

I*i-1|T|*2-1| <2 (P7.9.1) 

and is trying to minimize the monthly mileage of delivery trucks in deter¬ 
mining the location of a new warehouse on the assumption that the distance 
between two points represents the driving distance. 

(a) What is the objective function that must be defined in the program 
“nm7p09.m”? 

(b) What is the statement defining the inequality constraint (P7.9.1)? 

(c) Complete and run the program “nm7p09. m” to get the optimum location 
of the new warehouse. 


function [C,Ceq] = fp_warehouse_c(x) 
C = sum(abs(x - [1 1])) - 2; 

Ceq = []; % No equality constraint 


%nm7p09.m to solve the warehouse location problem 

f = ' sqrt([sum((x - [-16 4]).-2) sum((x - [6 5])."2) sum((????????)."2)])' ; 
fp_warehouse = inline([f '*[?;?;?]'], 1 x'); 

X0 = [1 1]; A = []; b = []; Aeq = []; beq = []; 1 = []; u = []; 
xo = fmincon(fp_warehouse,xO,A,b,Aeq,beq,1,u,'fp_warehouse_c') 


5 

0 

-5 

-10 

-20 -10 0 10 
Figure P7.9 The site of a new warehouse and the locations of the factories. 
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7.10 A Constrained Optimization on Ray Refraction 

A light ray follows the path that takes the shortest time when it travels in 
the space. We want to find the three angles Q\,Q 2 , and 63 (measured between 
the array and the normal to the material surface) of a ray traveling from 
P = (0, 0) to Q = (L, — {d\ + d 2 + dj)) through a transparent material of 
thickness d 2 and index of refraction n as depicted in Fig. P7.10. Note the 
following things. 
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Since the speed of light in the transparent material is v = c/n (c is the 
speed of light in the free space), the traveling time to be minimized 
can be expressed as 


Min f(0, d, n, L) = - 


)S 0\ c cos 02 c COS 6*3 


The sum of the three horizontal distances traveled by the light ray must 
be L: 


g(0, d, n, L) = di tan <9,- - L = 0 (P7.10.2) 

• The horizontal distance L and the index of refraction n are addition¬ 
ally included in the input argument lists of both the objective function 
f(0,d,n,L ) and the constraint function g(6, d. n, L) regardless of 
whether or not they are used in each function. It is because the objective 
function and the constraint function of the MATLAB routine “fmin- 
con()” must have the same input arguments. 

(a) Compose a program “nm7p10a.m” that solves the above constrained 
minimization problem to find the three angles di, O2, and O3 for n = 
1.52, di = dj = dj = l[cm], and different values of L = 0.6:0.3:6 and 
plots sin(#i)/sin(# 2 ) and sin( 0 3 )/sin( 02 ) versus L. 

(b) Compose a program “nm7p10b.m” that finds the three angles 6\,62, 
and O3 for L = 3 cm, d\ = c/ 2 = d^ = 1 cm, and different values of 
n = 1 : 0 . 01 : 1.6 and plots sin( 6 >i )/sin( 0 2 ) and sin( 6 > 3 )/sin( 02 ) versus n. 


T 

di 

1 

\P 

Aa light ray speed of light = c 
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r 



speed of light = dn 

d 2 


K\ 

transparent material 

1 


LA 

with refraction index n 

t 
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d 3 



0 3 \ 

Jkl 



\Q 


h«- 
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Figure P7.10 Refraction of a light ray at an air-glass interface. 


7.11 A Constrained Optimization on OFDM System 

In order to find the average modulation order x t for each user of an OFDM 
(orthogonal frequency division multiplex) system that has /V(128) subchan¬ 
nels to assign to each of the four users in the environment of noise power 
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No and the bit error rate (probability of bit error) P e , Seung-hee, a commu¬ 
nication system expert, formulated the following constrained minimization 
problem: 

Min f{x) = ~ I)y2(erfc- | (/y2)) 2 ^ (P7.11.1) 

subject to 

gto = 'E. =1 ^- N = 0 (P7.11.2) 

with N = 128, and a, : the data rate of each user 

where erfc _1 (x) is the inverse function of the complementary error function 
defined by Eq. (P4.9.3) and is installed as the MATLAB built-in function 
‘erfcinvO’. He defined the objective function and the constraint func¬ 
tion as below and save them in the M-files named “fp_bits1.m” and 
“fp_bits_c.m”. 


function y = fp_bits1(x,a,N,Pe) 

NO = 1; y = sum((2."x-1)*N0/3*2*erfcinv(Pe/2)."2.*a./x); 


function [C,Ceq] = fp_bits_c(x,a,N,Pe) 
C = []; Ceq = sum(a./x) - N; 


Compose a program that solves the above constrained minimization problem 
(with No = 1 and P e — 10 4 ) to get the modulation order jq of each user 
for five different sets of data rates 

a = [32 32 32 32], [64 32 32 32], [128 32 32 32], [256 32 32 32], and [512 32 32 32] 

and plots ai/xi(the number of subchannels assigned to user 1) versus a\ 
(the data rate of user 1). If you feel uneasy about the results obtained with 
your initial guesses, try with the initial guesses as follows for each set of 
data rates, respectively: 


x 0 = [0.5 0.5 0.5 0.5], [1 1 1 1], [1 1 1 1], [2 2 2 2], and [4 4 4 4] 





MATRICES AND 
EIGENVALUES 


In this chapter, we will look at the eigenvalue or characteristic value k and its 
corresponding eigenvector or characteristic vector v of a matrix. 


8.1 EIGENVALUES AND EIGENVECTORS 

The eigenvalue or characteristic value and its corresponding eigenvector or char¬ 
acteristic vector of an N x N matrix A are defined as a scalar A. and a nonzero 
vector v satisfying 

Av = A.v (A — kl) v = 0 (v 7 ^ 0) (8.1.1) 

where (k, v) is called an eigenpair and there are N eigenpairs for the N x N 
matrix A. 

How do we get them? Noting that 

• in order for the above equation to hold for any nonzero vector v, the matrix 
[A — kl] should be singular—that is, its determinant should be zero (|A — 
kl |= 0 )—and 

• the determinant of the matrix [A — kl] is a polynomial of degree N in terms 
of k, 

we first must find the eigenvalue A ( -’s by solving the so-called characteristic 
equation 

|A — kl\ = k N + dpi-ik^ i + • • • + ct\k + op = 0 ( 8 . 1 . 2 ) 


Applied Numerical Methods Using MATLAB ®, by Yang, Cao, Chung, and Morris 
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and then substitute the A,-’s, one by one, into Eq. (8.1.1) to solve it for the 
eigenvector v;’s. This is, however, not always so simple, especially if some root 
(eigenvalue) of Eq. (8.1.2) has multiplicity k > 1, since we have to generate k 
independent eigenvectors satisfying Eq. (8.1.1) for such an eigenvalue. Still, we 
do not have to worry about this, thanks to the MATLAB built-in routine “eig ()”, 
which finds us all the eigenvalues and their corresponding eigenvectors for a given 
matrix. How do we use it? All we need to do is to define the matrix, say A, and 
type a single statement into the MATLAB command window as follows. 

»[V,Lambda] = eig(A) %e = eig(A) just for eigenvalues 

Let us take a look at the following example. 


Example 8.1. Eigenvalues/Eigenvectors of a Matrix. 

Let us find the eigenvalues/eigenvectors of the matrix 

A = [o j] (E8.1.1) 

First, we find its eigenvalues as 

-l-x]H + X = ° 

A(A+1) = 0, A, = 0, A 2 = -1 (E8.1.2) 

and then, get the corresponding eigenvectors as 

[o 4][^y^Mo]- 

n '=°- v,= [s]=D] (E8 ' i3a) 

„ 12 = -„ 22 , *-[£]-[_;$] (E8.1.3b) 

where we have chosen %■, ui 2 , and u 22 so that the norms of the eigenvectors 
become one. 

Alternatively, we can use the MATLAB command “eig (A) ” for finding eigen¬ 
values/eigenvectors or “roots(poly (A))” just for finding eigenvalues as the 
roots of the characteristic equation as illustrated by the program “nm811 .m”. 
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%nm8ll to get the eigenvalues & eigenvectors of a matrix A. 
clear 

A = [0 1;0 -1]; 

[V,L] = eig(A) %v = modal matrix composed of eigenvectors 
% L = diagonal matrix with eigenvalues on its diagonal 
e = eig(A), roots(poly(A)) %just for eigenvalues 
L = V* - 1*A*V %diagonalize through similarity transformation 

% into a diagonal matrix having the eigenvalues on diagonal. 


8.2 SIMILARITY TRANSFORMATION AND DIAGONALIZATION 

Premultiplying a matrix A by P 1 and post-multiplying it by P makes a similarity 
transformation 

A -» P~ l AP (8.2.1) 

Remark 8.1 tells us how a similarity transformation affects the eigenval¬ 
ues/eigenvectors. 

Remark 8.1. Effect of Similarity Transformation on Eigenvalues/Eigenvectors 

1. The eigenvalues are not changed by a similarity transformation. 

| P~ l AP-Xl\ = | P l AP - P~ l UP | = |P _1 ||A - A/||P| = \A — XI\ 

( 8 . 2 . 2 ) 

2. Substituting v = P w into Eq. (8.1.1) yields 

Av = Av, APw = APw = PAw, [P _1 AP]w = Aw 

This implies that the matrix P X AP obtained by a similarity transformation 
has w = P -1 v as its eigenvector if v is an eigenvector of the matrix A. 


In order to understand the diagonalization of a matrix into a diagonal matrix 
(having its eigenvalues on the main diagonal) through a similarity transformation, 
we have to know the following theorem: 

Theorem 8.1. Distinct Eigenvalues and Independent Eigenvectors. 

If the eigenvalues of a matrix A are all distinct—that is, different from each 
other—then the corresponding eigenvectors are independent of each other and, 
consequently, the modal matrix composed of the eigenvectors as columns is 
nonsingular. 


Now, for an N x N matrix A whose eigenvalues are all distinct, let us put all 
of the equations (8.1.1) for each eigenvalue-eigenvector pair together to write 


A[vi v 2 ■ ■ ■ \ N ] = [Vl V 2 • • • Vat] 


A.i 

0 


0 

k 2 


0 

0 


0 0 ■ A .pj 


AV = VA 


(8.2.3) 
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Then, noting that the modal matrix V is nonsingular and invertible by Theo¬ 
rem 8.1, we can premultiply the above equation by the inverse modal matrix 
V~ l to get 

V~'AV = V~'VA = A (8.2.4) 

This implies that the modal matrix composed of the eigenvectors of a matrix A 
is the similarity transformation matrix that can be used for converting the matrix 
A into a diagonal matrix having its eigenvalues on the main diagonal. Here is an 
example to illustrate the diagonalization. 

Example 8.2. Diagonalization Using the Modal Matrix. 

Consider the matrix given in the previous example. 

A = [o _;] (E8.2.1) 

We can use the eigenvectors (E8.1.3) (obtained in Example 8.1) to construct 
the modal matrix as 

' , = [T ' l2l = [o -v^] (E8 ' 2 - 2) 

and use this matrix to make a similarity transformation of the matrix A as 

-Kin: -i: -iia 

-K -UK -$]-[: -i] «» 

which is a diagonal matrix having the eigenvalues on its main diagonal. 

This job can be performed by the last statement of the MATLAB program 
“nm811 .m”. 


This diagonalization technique can be used to decouple an IV-dimensional 
vector differential equation so that it can be as easy to solve as N independent 
scalar differential equations. Here is an illustration. 


Example 8.3. Decoupling of a Vector Equation Through Diagonalization 
(a) For the linear time-invariant (LTI) state equation (6.5.3) 


ivwiro liuwLroi 

_x 2 '{t)\ |_o -iJL*2(oJ hJ A 


r*i(o)i _ r i 
1 L* 2 ( 0 )j - L- 1 . 


and u s {t) = 1 V t > 0 


x'(f) = Ax(t) + Bu(t) with the initial state x(0) and the input a(t) 
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we use the modal matrix obtained as (E8.2.2) in Example 8.2 to make a substi¬ 
tution of variable 


"mo 1 _ r 1 

1 /V 2 ' 

( 0 " 

_x 2 (t) J [c 

) - 1 /V 2 . 

_w 2 {t)_ 


which converts Eq. (E8.3.1) into 

VW(t) = AVw(t) + Bu s (t) (E8.3.3) 

We premultiply (E8.3.3) by V -1 to write it in a decoupled form as 
W(t) = V ~ 1 AVv/(t) + V -1 Bu s (t) = A-w(t) + V~ l Bu s (t) with w(0) = V _1 x(0); 

r^'coiro oirwi(t)i n 1 iron r mo i 

|_W 2 '(7) J |_0 — 1J |_W 2 (r) J |_0 — V2J 1_ 1 J ' [-W 2 (t)-V2u s (t)\ 

(E8.3.4) 

Wi * [^(0)] = [J -V2][-l] = [^] 

where there is no correlation between the variables w\(t) and w 2 (t). Then we 
can solve these two equations separately to have 

wi(t) = u s (t ) with wi(0) = 0; 

sWi(0 - uq(0) = Wi(j) = 4; wi (0 = t u s (t ) (E8.3.5a) 
s s 1 

W 2 '(t) = —W 2 (t ) - V2u s (t) with ic 2 (0) = \/2; 

V2 

sW 2 (s) - w 2 (0) = -W 2 (s) -; 

VP = ^ = , ^2 

2,V 5+1 5(5 + 1) 5 5+1’ 

w 2 (t) = V2(-l + 2e -f )M0 (E8.3.5b) 

and substitute this into Eq. (E8.3.2) to get 


Moir 1 i/\/2i r«;i(f)i = ri 1/V2] r t 1 
_a: 2 (r)J _ Lo -I/V2J ~ |_0 -I/V2J lV2(-l+2e-‘)\ U - 


_~t- 1 +2e-'l 
" _ 1 - 2 e~* \ U 


hit) 

(E8.3.6) 


This is the same result as Eq. (6.5.10) obtained in Section 6.5.1. 
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(b) Suppose Eq. (E8.3.1) has no input term and so we can expect only the 
natural response resulting from the initial state, but no forced response 
caused by the input. 


K«i = [“ ;i h!';l w® [*■<?> 1 = r;i 

L«(oJ L° - 1 JL»«)J L*‘°)J L 1 J 


We apply the diagonalization/decoupling method for this equation to get 

ruV(oi r ^ o i ftnicoi _ ro oir Wl (ol 

[w 2 '(t)\ L° ^JL«>2(oJ L° -iJLmoJ 

'' <0) = v " x<0) - [2m ]=[J -U\ [!]=[- 72 ] 

[So]=[^!ov«]=[-A-] (E83 - 8) 

X (o (e = 2) vw (o=[ V1 v 2 ][: : ^::;;]= 

=[J < e8 - 3 - 9 > 


As time goes by, this solution converges and so the continuous-time sys¬ 
tem turns out to be stable, thanks to the fact that all of the eigenvalues 
(0, —1) are distinct and not positive. 


Example 8.4. Decoupling of a Vector Equation Through Diagonalization. 
Consider a discrete-time LTI state equation 



with [-[«]] = [_J J and = 1 V n > 0 (E8.4.1) 

In order to diagonalize this equation into a form similar to Eq. (E8.3.4), we use 
MATLAB to find the eigenvalues/eigenvectors and the modal matrix composed 
of the eigenvectors and finally, do the similarity transformation. 


A = [0 1;0.2 0.1]; B = [0; 2.2361]; % Eq.(E8.4.1) 

[V,L] = eig(A) % V = modal matrix composed of eigenvectors (E8.4.2) 

% L = diagonal matrix with eigenvalues on its diagonal 
Ap = V~-1*A*V %diagonalize through similarity transformation (E8.4.3) 
% into a diagonal matrix having the eigenvalues on the diagonal 
Bp = V~-1*B % (E8.4.3) 
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Then, we get 


'Ai 0 ] 

'-0.4 

0 ' 


_° 7. 2 J 

0 

0.5 J ’ 

V = [vi v 2 ] 


[-0.9285 

0.3714 


A„ = V~ l AV = 


[-0.4 

0 


and B„ = V~ 1 B 


2.6759' 
= _-2.7778 _ 


-0.8944] 
-0.4472 J 
(E8.4.2) 

(E8.4.3) 


so that we can write the diagonalized state equation as 


wi[n+l] _ 

'-0.4 0][uq[nn 

2.6759' 

_ w 2 [n + 1] J 

0 0.5 J [w 2 [n]J ^ 

' -2.7778 _ 


-0.4u>i[/i] + 2.6759' 
0.5 u> 2 [/!]- 2.7778 _ 


(E8.4.4) 


Without the input term on the right-hand side of Eq. (E8.4.1), we would have 
obtained 


w\\n + 1] _ Ai 0 ] [ wi[n] _ 7." +1 wi[0] 

W2t»+1]J _ L° [w 2 [n] J - [a.^ +1 u; 2 [0] _ 


with w[0] = V ' xfOJ 


(E8.4.5) 


x[n] = Vw [n] = [vi v 2 ] 


u;i[0]A.j 

,u> 2 [ 0JA.2 


= iu 1 [0]A."vi + w 2 [0]7^v 2 


(E8.4.6) 


As time goes by (i.e., as n increases), this solution converges and so the discrete¬ 
time system turns out to be stable, thanks to the fact that the magnitude of every 
eigenvalue (—0.4, 0.5) is less than one. 


Remark 8.2. Physical Meaning of Eigenvalues and Eigenvectors 


1. As illustrated by the above examples, we can use the modal matrix to 
decouple a set of differential equations so that they can be solved one 
by one as a scalar differential equation in terms of a single variable and 
then put together to make the solution for the original vector differential 
equation. 

2. Through the above examples, we can feel the physical significance of the 
eigenvalues/eigenvectors of the system matrix A in the state equation on its 
solution. That is, the state of a linear time-invariant (LTI) system described 
by an A-dimensional continuous-time (differential) state equation has N 

modes {e x ‘ l ; i = 1. N], each of which converges/diverges if the sign of 

the corresponding eigenvalue is negative/positive and proceeds slowly as 
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the magnitude of the eigenvalue is close to zero. In the case of a discrete¬ 
time LTI system described by an TV-dimensional difference state equation, 
its state has N modes {a"; i = 1,..., TV}, each of which converges/diverges 
if the magnitude of the corresponding eigenvalue is less/greater than one 
and proceeds slowly as the magnitude of the eigenvalue is close to one. 
To summarize, the convergence property of a state x or the stability of a 
linear-time invariant (LTI) system is determined by the eigenvalues of the 
system matrix A. As illustrated by (E8.3.9) and (E8.4.6), the corresponding 
eigenvector determines the direction in which each mode proceeds in the 
/V-dimcnsional state space. 


8.3 POWER METHOD 

In this section, we will introduce the scaled power method, the inverse power 
method and the shifted inverse power method, to find the eigenvalues of a 
given matrix. 


8.3.1 Scaled Power Method 

This method is used to find the eigenvalue of largest magnitude and is summarized 
in the following box. 


SCALED POWER METHOD 

Suppose all of the eigenvalues of an TV x N matrix A are distinct with the 
magnitudes 

M \k z \ >)%!>•••> |%| 

Then, the dominant eigenvalue Ai with the largest magnitude and its corre¬ 
sponding eigenvector Vi can be obtained by starting with an initial vector x 0 
that has some nonzero component in the direction of Vi and by repeating the 
following procedure: 

Divide the previous vector x^ by its largest component (in absolute value) 
for normalization (scaling) and premultiply the normalized vector by the 
matrix A. 

x k+1 = A—^-» AiVi with Moo = Max {|x„|} (8.3.1) 

II x* Hoc 
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Proof. According to Theorem 8.1, the eigenvectors {v„;« = 1 : N] of an N x N 
matrix A whose eigenvalues are distinct are independent and thus can constitute 
a basis for an IV-dimensional linear space. Consequently, any initial vector x 0 
can be expressed as a linear combination of the eigenvectors: 

x 0 = aiVi + a 2 v 2 H-h un^n (8.3.2) 

Noting that A\ n = A„v„, we premultiply both sides of this equation by A 
to get 


Ax 0 = oqA.iV! + a 2 A 2 v 2 -I-b a N X N y N 


and repeat this multiplication over and over again to obtain 


x k = A k x o 


= A 


joqvi + a 2 




AjaiVi (8.3.3) 


which will converge to an eigenvector Vi as long as a\ ^ 0. Since we keep 
scaling before multiplying at every iteration, the largest component of the limit 
vector of the sequence generated by Eq. (8.3.1) must be Ai. 


Note that the scaling prevents the overflow or underflow that would result from 
|Ai| > 1 or |Ai| < 1. 


Remark 8.3. Convergence of Power Method 

1. In the light of Eq. (8.3.3), the convergence speed of the power method 
depends on how small the magnitude ratio (| A 2 1 /1A 1 1) of the second largest 
eigenvalue A 2 over the largest eigenvalue Ai is. 

2. We often use x 0 = [ 1 1 ■ ■ ■ 1 ] as the initial vector. Note that 

if it has no component in the direction of the eigenvector (vi) 
corresponding to the dominant eigenvalue Ai —that is, a\ = x 0 *vi/||vi || 2 = 
0 in Eq. (8.3.2)—the iteration of the scaled power method leads to the limit 
showing the second largest magnitude eigenvalue A 2 and its corresponding 
eigenvector v 2 . But, if there is more than one largest (dominant) eigenvalue 
of equal magnitude, it does not converge to either of them. 
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8.3.2 Inverse Power Method 

The objective of this method is to find the (uniquely) smallest (magnitude) eigen¬ 
value X N by applying the scaled power method to the inverse matrix /I 1 and 
taking the inverse of the largest component of the limit. It works only in cases 
where the matrix A is nonsingular and thus has no zero eigenvalue. Its idea is 
based on the equation 

Av = Av -» A _1 v = A _1 v (8.3.5) 

obtained from multiplying both sides of Eq. (8.1.1) by A -1 A 1 . This implies 
that the inverse matrix A' 1 has the eigenvalues that are the reciprocals of the 
eigenvalues of the original matrix A, still having the same eigenvectors. 


the largest eigenvalue of A 1 


8.3.3 Shifted Inverse Power Method 

In order to develop a method for finding the eigenvalue that is not necessarily 
of the largest or smallest magnitude, we subtract sv (s: a number that does not 
happen to equal any eigenvalue) from both sides of Eq. (8.1.1) to write 

Av = Av -» [A - s/]v = (A — s)y (8.3.7) 

Since this implies that (A. — 5 ) is the eigenvalue of [A — si], we apply the inverse 
power method for [A — si] to get its smallest magnitude eigenvalue (A* — s) with 
min{|A, — s\,i = l : N] and add s to it to obtain the eigenvalue of the original 
matrix A which is closest to the number s. 

X s = ---- + s (8.3.8) 

the largest eigenvalue of [A — s/]' 1 

The prospect of this method is supported by Gerschgorin’s disk theorem, 
which is summarized in the box below. But, this method is not applicable to the 
matrix that has more than one eigenvalue of the same magnitude. 

Theorem 8.2. Gerschgorin’s Disk Theorem. 

Every eigenvalue of a square matrix A belongs to at least one of the disks 
(in the complex plane) with center a mm (one of the diagonal elements of A) and 
radius 

r m = ^ |a m „|(the sum of all the elements in the row except the diagonal element) 
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Moreover, each of the disks contains at least one eigenvalue of the 
matrix A. 

The power method introduced in Section 8.3.1 is cast into the routine 
“eig_power ()”. The MATLAB program “nm831. m” uses it to perform the power 
method, the inverse power method and the shifted inverse power method for 
finding the eigenvalues of a matrix and compares the results with that of the 
MATLAB built-in routine “eig( )” for cross-check. 


function [lambda,v] * eig_power(A,x,EPS,MaxIter) 

% The power method to find the largest eigenvalue (lambda) and 
% the corresponding eigenvector (v) of a matrix A. 
if nargin < 4, Maxlter = 100; end % maximum number of iterations 
if nargin < 3, EPS = 1e-8; end % difference between successive values 
N - size(A,2); 

if nargin < 2, x = [1 :N]; end % the initial vector 

x = x(:); 

lambda = 0; 

for k = 1:MaxIter 

xl = x; lambdal = lambda; 
x = A*x/norm(x,inf); %Eq.(8.3.4) 

[xm.m] = max(abs(x)); 

lambda = x(m); % the component with largest magnitude(absolute value) 
if norm(x1 - x) < EPS & abs(lambda1-lambda) < EPS, break; end 

if k == Maxlter, disp('Warning: you may have to increase Maxlter 1 ); end 

%nm831 

%Apply the power method to find the largest/smallest/medium eigenvalue 
A = [2 0 1;0 -2 0;1 0 2]; 

x = [1 23]'; %x = [1 1 1]'; % with different initial vector 
EPS = 1e-8; Maxlter = 100; 

%the largest eigenvalue and its corresponding eigenvector 
[lambdamax,v] = eig_power(A,x,EPS,Maxlter) 

%the smallest eigenvalue and its corresponding eigenvector 
[lambda,v] = eig_power(A" - 1,x,EPS,Maxlter); 
lambda_min = 1/lambda, v %Eq.(8.3.6) 

%eigenvalue nearest to a number and its corresponding eigenvector 
s = -3; AsI = (A - s*eye(size(A)))' - 1; 

[lambda,v] = eig_power(AsI,x,EPS,Maxlter); 
lambda = 1/lambda+s %Eq.(8.3.8) 

fprintf('Eigenvalue closest to %4.2f = %8.4f\nwith eigenvector',s,lambda) 
[V,LAMBDA] = eig(A) %modal matrix composed of eigenvectors 


8.4 JACOBI METHOD 

This method finds us all the eigenvalues of a real symmetric matrix. Its idea is 
based on the following theorem. 
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Theorem 8.3. Symmetric Diagonalization Theorem. 

All of the eigenvalues of an N x N symmetric matrix A are of real value and 
its eigenvectors form an orthonormal basis of an IV-dimensional linear space. 
Consequently, we can make an orthonormal modal matrix V composed of the 
eigenvectors such that V T V = /; V 1 = V T and use the modal matrix to make 
the similarity transformation of A, which yields a diagonal matrix having the 
eigenvalues on its main diagonal: 

V T AV = V~ 1 AV = A (8.4.1) 


Now, in order to understand the Jacobi method, we define the /^-rotation 


matrix as 



p lh column 

q th column 




'1 

0 • 

0 

0 

• O' 



0 

1 ■ 

0 

0 

• 0 


R pq (0) = 

0 

0 • 

cos# 

— sin# 

• 0 

^ r0W (8.4.2) 


0 

0 • 

sin# 

cos# 

• 0 

q'b row 


_0 

0 • 

0 

0 

■ 1 _ 



Since this is an orthonormal matrix whose row/column vectors are orthogonal 
and normalized 

R T pq R pq = /, R T pq = R~l (8.4.3) 

premultiplying/postmultiplying a matrix A by R p / R pq makes a similarity trans¬ 
formation 

^(1) = R m A R p« < 8 - 4 - 4 ) 

Noting that the similarity transformation does not change the eigenvalues (Re¬ 
mark 8.1), any matrix resulting from repeating the same operations successively 

^(fc+1) = R (k)^(k) R (k) = R (k) R (k-1 ) ' ■ ■ R A r ■ ■ ■ R(k-l)R(k) (8.4.5) 

has the same eigenvalues. Moreover, if it is a diagonal matrix, it will have all 
the eigenvalues on its main diagonal, and the matrix multiplied on the right of 
the matrix A is the modal matrix V 

V = R ■ ■ ■ R (k . t) R (k) (8.4.6) 

as manifested by matching this equation with Eq. (8.4.1). 
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function [LAMBDA,V,ermsg] = eig_Jacobi(A,EPS,Maxlter) 

%Jacobi method finds the eigenvalues/eigenvectors of symmetric matrix A 
if nargin < 3, Maxlter = 100; end 
if nargin < 2, EPS = 1e-8; end 
N = size(A,2); 

LAMBDA =[]; V = []; 
for m = 1:N 

if norm(A(m:N,m) - A(m,m:N)') > EPS 
error( 1 asymmetric matrix!'); 


V = eye(N); 
for k = 1:MaxIter 
for m = 1:N - 1 

[Am(m),Q(m)] = max(abs(A(m,m + 1:N))); 

[Amm,p] = max(Am); q = p + Q(p); 

if Amm < EPS*sum(abs(diag(LAMBDA))), break; end 

if abs(A(p,p)-A(q,q))<EPS 

s2 = 1; s = 1/sqrt(2); c = s; 


t2 = 2*A(p,q)/(A(p,p)- A(q,q)); %Eq.(8.4.9a) 

c2 = 1/sqrt(1 + t2*t2); s2 = t2*c2; %Eq.(8.4.9b,c) 
c = sqrt((1 + c2)/2); s = s2/2/c; %Eq.(8.4.9d,e) 

end 

LAMBDA = A; 

LAMBDA(p,:) = A(p,:)*c + A(q,:)*s; %Eq.(8.4.7b) 

LAMBDA!:,P) = LAMBDA(p,:) 1 ; 

LAMBDA(q,:) = -A(p,:)*s + A(q,:)*c; %Eq.(8.4.7c) 

LAMBDA(:,q) = LAMBDA(q,:)'; 

LAMBDA(p,q) = 0; LAMBDA(q,p) = 0; %Eq.(8.4.7a) 

LAMBDA(p,p) = A(p,p)*cc +A(q,q)*ss + A(p,q)*s2; %Eq.(8.4.7d) 
LAMBDA(q,q) = A(p,p)*ss +A(q,q)*cc - A(p,q)*s2; %Eq.(8.4.7e) 
A = LAMBDA; 

V(:,[p q]) = V(:,[p q])*[c -s;s c]; 

LAMBDA = diag(diag(LAMBDA)); %for purification 


%nm841 applies the Jacobi method 

% to find all the eigenvalues/eigenvectors of a symmetric matrix A. 

A - [2 0 1;0 -2 0;1 0 2]; 

EPS = 1e-8; Maxlter =100; 

[L,V] = eig_Jacobi(A,EPS,Maxlter) 
disp('Using eig()') 

[V,LAMBDA] = eig(A) %modal matrix composed of eigenvectors 


What is left for us to think about is how to make this matrix (8.4.5) diag¬ 
onal. Noting that the similarity transformation (8.4.4) changes only the pth 
rows/columns and the c/th rows/columns as 

v pq = v qp = a qp (c 2 - s 2 ) + (a qq - a pp )sc 

1 


cos 29 + ~{a q 


,) sin 20 


(8.4.7a) 
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Vpn 

= v np = a pn c + a qn s for the p th 

row/column with n ^ p,q 

(8.4.7b) 

Vqn 

= v np = —a pn s + a qn c for the q 

h row/column with n ^ p,q 

(8.4.7c) 

V PP 

— QppC ” 1 ” ®qqS ” 1 ” 2d pq SC — dppC 

+ a qq s 2 + a pq sin 2 6 

(8.4.7d) 

l) qq 

— a pp s T a qq c- 2ci pq sc — ci pp s 

- + a qq c 2 - a pq sin 26 

(8.4.7e) 


(c = cos0, 5 = sin0) 




we make the ( p , q) element v pq and the (q, p) element v qp zero 

Vpq = v qp = 0 (8.4.8) 


by choosing the angle 6 of the rotation matrix R pq (6 ) in such a way that 
sin 26 2a pq _ 1 1 


cos 26 a pp — a qq ’ 
sin 26 = tan 26 cos 26 


= 7(h 


sec 26 Vl + tan 2 26 ’ 


sin 20 
= 2cos0 


and computing the other associated elements according to Eqs. (8.4.7b-e). 

There are a couple of things to note. First, in order to make the matrix closer 
to a diagonal one at each iteration, we should identify the row number and the 
column number of the largest off-diagonal element as p and q, respectively, and 
zero-out the ( p , q) element. Second, we can hope that the magnitudes of the 
other elements in the pth,gth row/column affected by this transformation process 
don’t get larger, since Eqs. (8.4.7b) and (8.4.7c) implies 


v 2 pn + v 2 n = ( a pn c + a qn s) 2 + ( -a pn s + a qn c ) 2 = a 2 pn + a 2 n (8.4.10) 


This so-called Jacobi method is cast into the routine “eig_Jacobi( )”. The 
MATLAB program “nm841 .m” uses it to find the eigenvalues/eigenvectors of a 
matrix and compares the result with that of using the MATLAB built-in routine 
“eig()” for cross-check. The result we may expect is as follows. Interested 
readers are welcome to run the program “nm841 .m”. 


”2 0 1 


'3 

0 0" 

0-2 0 


0 

-2 0 

1 0 2 

0 

0 1 _ 


"1/V2 0 -1/V2" 


with J?i 3 = 

0 1 0 

= y 


1 /V2 0 1/V2 
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8.5 PHYSICAL MEANING OF EIGENVALUES/EIGENVECTORS 

According to Theorem 8.3 (Symmetric Diagonalization Theorem), introduced in 
the previous section, the eigenvectors {\„,n = 1 : N] of an N x N symmetric 
matrix A constitute an orthonormal basis for an /V-dimcnsional linear space. 

V T V = I, y' m y n = = { ( ' } (8.5.1) 

Consequently, any /V-dimcnsional vector x can be expressed as a linear combi¬ 
nation of these eigenvectors. 

x = o-iVi + a 2 \2 H-h 0 i N \ N = ^ a„v„ (8.5.2) 

n= 1 

Thus, the eigenvectors are called the principal axes of matrix A, and the squared 
norm of a vector is the sum of the squares of the components (o: n ’s) along the 
principal axis. 

' (x>.) = t = pi 

" '(8.5.3) 

Premultiplying Eq. (8.5.2) by the matrix A and using Eq. (8.1.1) yields 

N 

Ax = A.iaiVi + X2CX2V2 H-+ X n oinVn = X„a n \„ (8.5.4) 

n= 1 

This shows that premultiplying a vector x by matrix A has the same effect as 
multiplying each principal component a n of x along the direction of eigenvector 
v„ by the associated eigenvalue X n . Therefore, the solution of a homogeneous 
discrete-time state equation 

x(k + 1) = Ax(k) with x(0) = a„\ n (8.5.5) 

n= 1 

can be written as 

N 

x(k) = (8.5.6) 

n= 1 

which was illustrated by Eq. (E8.4.6) in Example 8.4. On the other hand, as illus¬ 
trated by (E8.3.9) in Example 8.3(b), the solution of a homogeneous continuous¬ 
time state equation 

with x(0) = T>„v„ 


At) = Ax(t) 


(8.5.7) 
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can be written as 


X (o = E eX "' a " v ' 


(8.5.8) 


Equations (8.5.6) and (8.5.8) imply that the eigenvalues of the system matrix 
characterize the principal modes of the system described by the state equations. 
That is, the eigenvalues determine not only whether the system is stable or 
not—that is, whether the system state converges to an equilibrium state or 
diverges—but also how fast the system state proceeds along the direction of 
each eigenvector. More specifically, in the case of a discrete-time system, the 
absolute values of all the eigenvalues must be less than one for stability and 
the smaller the absolute value of an eigenvalue (less than one) is, the faster the 
corresponding mode converges. In the case of a continuous-time system, the real 
parts of all the eigenvalues must be negative for stability and the smaller a neg¬ 
ative eigenvalue is, the faster the corresponding mode converges. The difference 
among the eigenvalues determines how stiff the system is (see Section 6.5.4). 
This meaning of eigenvalues/eigenvectors is very important in dynamic systems. 

Now, in order to figure out the meaning of eigenvalues/eigenvectors in static 
systems, we define the mean vector and the covariance matrix of the vectors 
{x (1) , x (2) ,..., x (K) ) representing K points in a two-dimensional space called the 
x 1 X 2 plane as 

111 * = j E x ® - c * = ^ E [xW “ m * 1 [ x ® - m* ] :r (8.5.9) 

where the mean vector represents the center of the points and the covariance 
matrix describes how dispersedly the points are distributed. Let us think about 
the geometrical meaning of diagonalizing the covariance matrix C x . As a simple 
example, suppose we have four points 

* 0> = [-!]• «" = ["£]■ x<3> ~ [ 3 ] ’ x<4> = [ 2 ] < 85 ->°> 

for which the mean vector m x , the covariance matrix C x , and its modal matrix 
are 


-’*[!]• c *=[“ &]• *-‘ ti 

Then, we can diagonalize the covariance matrix as 

""A[i !] 

-[? 4-[i a- 


(8.5.12) 
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which has the eigenvalues on its main diagonal. On the other hand, if we trans¬ 
form the four point vectors by using the modal matrix as 

y = V T (x — m v ) (8.5.13) 


then the new four point vectors are 



(8.5.14) 

for which the mean vector m A and the covariance matrix C x are 


= V T (m 


_ "O' 

~ °J ’ 


Cy = V l C X V = 


"0.5 0 ' 

0 4.5 


= A 


(8.5.15) 

The original four points and the new points corresponding to them are depicted 
in Fig. 8.1, which shows that the eigenvectors of the covariance matrix for a set of 
point vectors represents the principal axes of the distribution and its eigenvalues 
are related with the lengths of the distribution along the principal axes. The 
difference among the eigenvalues determines how oblong the overall shape of 
the distribution is. 

Before closing this section, we may think about the meaning of the deter¬ 
minant of a matrix composed of two two-dimensional vectors and three three- 
dimensional vectors. 



Figure 8.1 Eigenvalues/eigenvectors of a covariance matrix. 
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First, let us consider a 2 x 2 matrix composed of two two-dimensional vectors 
x (1) and x (2> . 

X = [x (1) x®] = P“ H (8.5.16) 

|_*21 *22 J 

Conclusively, the absolute value of the determinant of this matrix 


det(Z) = [X’l = x n x 2 2 - * 12*21 (8.5.17) 

equals the area of the parallelogram having the two vectors as its two neighboring 
sides. In order to certify this fact, let us make a clockwise rotation of the two 
vectors by the phase angle of x (1) 


-Ox 



(8.5.18) 


so that the new vector y (1) corresponding to x (1> becomes aligned with the *i-axis 
(see Fig. 8.2). For this purpose, we multiply our matrix X by the rotation matrix 
defined by Eq. (8.4.2) 


cos 0 i — sin(— Q\) 

sin(—0 \) cos 




*n *21 

I —*21 *11 I 


Y=R(-B l )X= - 1 [ - V " Hb * 12 ' 

fZT~r l~X21 *llJL^21 *22 _ 


[y“> y w I = , ‘ [* 


* 11*12 + 
-* 12*21 + * 11*21 


"21*22 
"11*22 _ 


(8.5.20a) 

(8.5.20b) 


The parallelograms having the original vectors and the new vectors as their two 
neighboring sides are depicted in Fig. 8.2, where the areas of the parallelograms 
turn out to be equal to the absolute values of the determinants of the matrices X 
and Y as follows: 


Area of the parallelograms 


= Length of the bottom side x Fleight of the parallelogram 
= (*1 component of y (1) ) x (x 2 component of y (2) ) = yny 22 = det(T) 



x 


—* 12*21 + * 11*22 



3 det(X) 


(8.5.21) 
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On extension of this result into a three-dimensional situation, the absolute 
value of the determinant of a 3 x 3 matrix composed of three three-dimensional 
vectors x (1) ,x <2) , and x (3> equals the volume of the parallelepiped having the 
three vectors as its three edges. 


det(X) = |X| = |x (1) x (2) 


*11 

*21 

*31 


*12 

*22 

*32 


*13 

*23 

*33 


= X w X X' ' • X v ' 


(8.5.22) 


8.6 EIGENVALUE EQUATIONS 

In this section, we consider a system of ordinary differential equations that can 
be formulated as an eigenvalue problem. 

For the undamped mass-spring system depicted in Fig. 8.3, the displacements 
x\(t) and X 2 (?) of the two masses m\ and m 2 are described by the following 
system of differential equations: 


r *j(o 1 _ _ r (^1 + k 2 )/m ] 

l -k 2 lmi 1 

r*i(oi 

L*2(oJ L -kilm 2 

k 2 /m 2 \ 

L*2(0j 

with hsi 

and [*$1 


L* 2 (0)J 

L*'(o) j 



x"(t) = —Ax(f) with x(0) and x'(0) (8.6.1) 

Let the eigenpairs (eigenvalue-eigenvectors) of the matrix A be ( X n = oi 2 , v„ ) with 
Av„ = w 2 v„ 


( 8 . 6 . 2 ) 
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Figure 8.3 An undamped mass-spring system. 


Noting that the solution of Eq. (8.5.7) can be written as Eq. (8.5.8) in terms of 
the eigenvectors of the system matrix, we write the solution of Eq. (8.6.1) as 

x (0 = Y^w n {t)\n = [Vi y 2 ][^] = Vw(0 (8.6.3) 

and substitute this into Eq. (8.6.1) to have 

J2 w "n = -■ A E w " <8 = 2) - E w » V « ( 8 - 6 - 4 ) 

w" n (t) = -co 2 n w n (t) for n = 1,2 (8.6.5) 

The solution of this equation is 

w n (t ) = w n (0) cos(m„0 + —— sin(m„t) with co n = for n = 1,2 

( 8 . 6 . 6 ) 

where the initial value of w (t) = [uq (t) W 2 (t] T can be obtained via Eq. (8.6.3) 
from that of x(t) as 

w(0) (8 = 3) E _1 x(0) (8 = 1} V r x(0), w'(0) = V T x'(0) (8.6.7) 

Finally, we substitute Eq. (8.6.6) into Eq. (8.6.3) to obtain the solution of 
Eq. (8.6.1). 

PROBLEMS 

8.1 Symmetric Tridiagonal Toeplitz Matrix 

Consider the following N x N symmetric tridiagonal Toeplitz matrix as 

~a b 0 -0 0" 

b a b - 0 0 

0 b a •• 0 0 

. (P8.1.1) 

0 0 0 ■■ a b 

0 0 0 •• b a 
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(a) Verify that the eigenvalues and eigenvectors of this matrix are as follows, 
with V = 3 for convenience. 



(b) Letting N = 3, a = 2, and b = 1, find the eigenvalues/eigenvectors of 
the above matrix by using (P8.1.2,3) and by using the MATLAB routine 
“eig_Jacobi( )” or “eig()” for cross-check. 

8.2 Circulant Matrix 

Consider the following N x N circulant matrix as 


' H 0) 

h(N - 1) 

h(N - 2) 

•• Ml)' 

h(\) 

h( 0) 

h(N- 1) 

•• M2) 

h(2) 

h( 1) 

M0) 

•• M3) 

_h(N — 1) 

h(N - 2) 

h(N - 3) 

- M0)_ 


(a) Vertify that the eigenvalues and eigenvectors of this matrix are as follows, 
with N = 4 for convenience. 


X n = h( 0) + h(N - l)e j2nn,N + h(N - 2)e j27l2n/N (P8.2.2) 

+ ■ ■ ■ + h(l)e j2n(N - l)n/N 

Vjj = e i2nn/N e j2n2nlN . . . e j2n(N~l)n/N^T (P8.2.3) 

for n = 0 to N — 1 


(b) Letting N = 4, h( 0) = 2, h( 3) = h(\) = 1, and h(2) = 0, find the eigen¬ 
values/eigenvectors of the above matrix by using (P8.2.2,3) and by using 
the MATLAB routine “eig_Jacobi( )” or “eig()”. Do they agree? Do 
they satisfy Eq. (8.1.1)? 

8.3 Solving a Vector Differential Equation by Decoupling: Diagonalization. 
Consider the following two-dimensional vector differential equation (state 
equation) as 
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which was solved by using Laplace transform in Problem P6.1. In this prob¬ 
lem, we solve it again by the decoupling method through diagonalization of 
the system matrix. 

(a) Show that the eigenvalues and eigenvectors of the system matrix are as 
follows. 

a, =-l, A 2 = -2; vi = [_j], v 2 = [ _2 ] ( P8 - 3 ’ 2 ) 

(b) Show that the diagonalization of the above vector differential equation 
using the modal matrix V = [ Vi v 2 J yields the following equation: 


(c) Show that these equations can be solved individually by using Laplace 
transform technique to yield the following solution, which is the same 
as Eq. (P6.1.2) obtained in Problem P6.1(a). 


WiO) = - + l—]—, Wl (t) = (1 + «->,(*) 

S S + 1 

- 1/2 1/2 1 „ 

W 2 (s) = — w 2 (t) = --(I + e~ 2t )u s {t) 

[x 2 (r)J [ -e~' +e~ 2t \ s 


(P8.3.4a) 

(P8.3.4b) 

(P8.3.5) 


8.4 Householder Method and QR Factorization 

This method can zero-out several elements in a column vector at each iter¬ 
ation and make any N x N matrix a (lower) triangular matrix in (IV — 1) 
iterations. 

(a) Householder Reflection (Fig. P8.4) 

Show that the transformation matrix by which we can multiply a vector 
x to generate another vector y having the same norm is 

H = [I — 2ww r ] 

with w = M X ~ y , = -(X - y), c = ||x - y|| 2 , ||x|| = ||y|| (P8.4.1) 

llx — y|| 2 c 

and that this is an orthonormal symmetric matrix such that H J H = 
HH = /; H 1 = H. Note the following facts. 
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x - y = cw w = l(x-y) = f ^ 



ll x ll=l|y|| 

Figure P8.4 Householder reflection. 


(i) 

, x (P8.4.1) 

y = x — (x — y) = x — cw 

(P8.4.2a) 

(ii) 

W 7 w (P =' 1) 1 and ||x|| = ||y|| 

(P8,4.2b) 

(iii) 

m= (x + y)/2 = x- (c/2)w 

(P8.4.2c) 


(iv) The mean vector m of x and y is orthogonal to the difference vector 
w = (x - y)/c. 

Thus we have 

w r (x - (c/2)w) = 0; w r x - (c/2)w r w = w T x - (c/2) = 0 

(P8.4.3) 

This gives an expression for c = ||x — y|| 2 as 

c = ||x - y|| 2 = 2w r x (P8.4.4) 

We can substitute this into (P8.4.2a) to get the desired result. 

y = x — cw = x — 2ww t x = [I - 2ww r ]x == Hx (P8.4.5) 

On the other hand, the Householder transform matrix is an orthog¬ 
onal matrix, since 

H T H =HH = [I- 2ww r ][/ - 2ww r ] 

= I — 4ww r + 4ww r ww 7 ’ 

= I - 4ww r + 4ww r = I (P8.4.6) 


(b) Householder Transform 

In order to show that the Householder matrix can be used to zero-out 
some part of a vector, let us find the kth Householder matrix H k trans¬ 
forming any vector 


X=[*1 X k -1 X k X k+ 1 ■■■ X N \ 


(P8.4.7) 
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into 

y = [*t ••• **-i ~gk 0 0] (P8.4.8) 


where gk is fixed in such 
the same: 


way that the norms of these two vectors are 


N 



First, we find the difference vector of unit norm as 
1 

w* = -(x-y) 
c 

= -[0 ■■■ 0 x k + gk x k +i ■■■ x N ] (P8.4.10) 

c L J 


with 


C = ||X - y|| 2 = J(x k + gk) 2 + X 2 k+1 + ■ ■ ■ + 4 (P8.4.11) 

Then, one more thing we should do is to substitute this difference vector 
into Eq. (P8.4.1). 

H k = [I -2w*wJ] (P8.4.12) 

Complete the following routine “HouseholderO” by permuting the 
statements and try it with k = 1, 2, 3, and 4 for a four-dimensional 
vector generated by the MATLAB command rand (5,1) to check if it 
works fine. 

» x = rand(5,1), for k = 1:4, householder(x,k)*x, end 


function H = Householder(x,k) 

%Householder transform to zero out tail part starting from k + 1 
H = eye(N) - 2*w*w'; %Householder matrix 
N = length(x); 
w = zeros(N,1); 

w(k) =(x(k) + g)/c; w(k + 1:N) = x(k + 1:N)/c; %Eq.(P8.4.10) 
tmp = sum(x(k + 1:N).~ 2); 
c = sqrt((x(k) + g)~2 + tmp); %Eq.(P8.4.11) 
g = sqrt(x(k)~2 + tmp); %Eq.(P8.4.9) 


(c) QR Factorization Using Householder Transform 

We can use Householder transform to zero out the part under the main 
diagonal of each column of an N x N matrix A successively and then 
make it a lower triangular matrix R in (N — 1) iterations. The necessary 
operations are collectively written as 


H n x H n _ 2 -H x A = R 


(P8.4.13) 
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which implies that 

A = [H n ^H n _ 2 • • ■ H x r [ R = H- 1 ■ ■ ■ H~l 2 H~l,R 

— H\ • • ■ H N _ 2 H N -iR = QR (P8.4.14) 

where the product of all the Householder matrices 

Q = H X -H N ^ 2 H N _ X (P8.4.15) 

turns out to be not only symmetric, but also orthogonal like each H k : 
Q t Q = [H 1 - - H n „ 2 H n _Q t H x ■ ■ ■ H]s/— 2 H N —i 
= HJj_ j H■■■ H\ ■■■ H N - 2 H N _ X = I 

This suggests a QR factorization method that is cast into the following 
routine “qr_my ()”. You can try it for a nonsingular 3x3 matrix gener¬ 
ated by the MATLAB command rand(3) and compare the result with 
that of the MATLAB built-in routine “qr ()”. 


function [Q,R] = qr_my(A) 

%QR factorization 
N = size(A,1); R = A; Q = eye(N); 
for k = 1:N - 1 

H = Householder(R(:,k),k); 

R = H*R; %Eq.(P8.4.13) 

Q = Q*H; %Eq.(P8.4.15) 
end 


8.5 Hessenberg Form Using Householder Transform 


function [Hs,HH] = Hessenberg(A) 

%Transform into an almost upper triangular matrix 
% having only zeros below lower subdiagonal 
N = size(A,1); Hs = A; HH = eye(N); %HH*A*HH 1 = Hs 
for k = 1 :N - 2 

H = Householder(Hs(:,k), ); 

Hs = H*Hs*H; HH = H*HH; 
end 


We can make use of Householder transform (introduced in Problem 8.4) to 
zero-out the elements below the lower subdiagonal of a matrix so that it 
becomes an upper Hessenberg form which is almost upper-triangular matrix. 
Complete the above routine “Hessenberg ()” by filling in the second input 
argument of the routine “Householder) )” and try it for a 5 x 5 matrix 
generated by the MATLAB command rand (5) to check if it works. 
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8.6 QR Factorization of Hessenberg Form Using the Givens Rotation 

We can make use of the Givens rotation to get the QR factorization of Hessen¬ 
berg form by the procedure implemented in the following routine 
“qr_Hessenberg ()”, where each element on the lower subdiagonal is zeroed 
out at each iteration. Generate a 4 x 4 random matrix A by the MATLAB com¬ 
mand rand (4), transform it into a Hessenberg form Hs by using the routine 
“Hessenberg( )” and try this routine “qr_Hessenberg()” for the matrix of 
Hessenberg form. Check the validity by seeing if norm (Hs - Q*R) r* o or not. 


function [Q,R] = qr_Hessenberg(Hs) 

%QR factorization of Hessenberg form by Givens rotation 

Q = eye(N); ’r^ Hs; 
for k = 1:N - 1 

x = R(k,k); y = R(k+1,k); r = sqrt(x*x + y*y); 
c = x/r; s = -y/r; 

RO = R; QO = Q; 

R(k,:) = c*R0(k,:) - s*RO(k + 1,:); 

R(k + 1,:) = s*RO(k,:) + c*RO(k + 1,:); 

Q(:,k) = c*Q0(:,k) - s*QO(:,k + 1); 

Q(:,k + 1) = s*QO(:,k) + c*QO(:,k + 1); 
end 


8.7 Diagonalization by Using QR Factorization to Find Eigenvalues 

You will see that a real symmetric matrix A can be diagonalized into a 
diagonal matrix having the eigenvalues on its diagonal if we repeat the 
similarity transformation by using the orthogonal matrix Q obtained from the 
QR factorization. For this purpose, take the following steps. 


function [eigs,A] = eig_QR(A,kmax) 

%Find eigenvalues by using QR factorization 
if nargin < 2, kmax = 200; end 
for k = 1: kmax 

[Q,R] = qr(A); %A = Q*R; R =Q'*A =Q A -1*A 
A = R*Q; %A = Q A - 1*A*Q 
end 

eigs = diag(A); 

function [eigs,A] = eig_QR_Hs(A,kmax) 

%Find eigenvalues by using QR factorization via Hesenberg 
if nargin < 2, kmax = 200; end 
Hs = hessenberg(A); 
for k = 1 :kmax 

[Q,R] = qr_hessenberg(Hs); %Hs = Q*R; R = Q'*Hs = Q A - 1*Hs 
Hs = R*Q; %Hs = Q A - 1*Hs*Q 
end 

eigs = diag(Hs); 
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(a) Make the above routine “eig_QR()” that uses the MATLAB built-in 
routine “qr ()” and then apply it to a 4 x 4 random symmetric matrix A 
generated by the following MATLAB statements. 

» A = rand(4); A = A + A 1 ; 

(b) Make the above routine “eig_QR_Hs()” that transforms a given matrix 
into a Hessenberg form by using the routine “Hessenberg( )” (appeared 
in Problem 8.5) and then repetitively makes the QR factorization by 
using the routine “qr_Hessenberg( )” (appeared in Problem 8.6) and 
the similarity transformation by the orthogonal matrix Q until the matrix 
becomes diagonal. Apply it to the 4 x 4 random symmetric matrix A 
generated in (a) and compare the result with those obtained in (a) and 
by using the MATLAB built-in routine “eig( )” for cross-check. 

8.8 Differential/Difference Equation, State Equation, and Eigenvalue 

As mentioned in Section 6.5.3, a high-order scalar differential equation such 
as 

x°\t) + a 2 x (1 \t) + a\x'(t) + a 0 x(t ) = u(t) (P8.8.1) 


can be transformed into a first-order vector differential equation, called a 
state equation, as 




"01 0 " 

XI (t) 


"0" 

X 2 '(t ) 

= 

0 0 1 

X 2 (?) 

+ 

0 

*3 (0 _ 


—fl() ~ a \ ~ a 2 

_X 3 (t)_ 


1 


oc(0 = [ 1 0 


0 ] 


X\(t) 

X2 (t) 
.*3(0 


(P8.8.2b) 


The characteristic equation of the differential equation (P8.8.1) is 


s 3 + a 2 s 2 + a\s + ao = 0 


(P8.8.3) 


and its roots are called the characteristic roots. 

(a) What is the relationship between these characteristic roots and the eigen¬ 
values of the system matrix A of the above state equation (P8.8.2)? To 
answer this question, write the equation \XI — A\ =0 to solve for the 
eigenvalues of A, and show that it is equivalent to Eq. (P8.8.3). To extend 
your experience or just for practice, you can try the symbolic computation 
of MATLAB by running the following program “nm8p08a. m”. 
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%nm8p08a 
syms aO al a2 s 

A =[0 1 0;0 0 1;-aO -al -a2]; %(P8.8.2a) 
det(s*eye(size(A))- A) %characteristic polynomial 
ch_eq = poly(A) %or, equivalently 


(b) Let the input u(t) in the state equation (P8.8.2) be dependent on the state 
u(t) = Kx(t) = [K 0 x x (t) K\x 2 (t) *2*3011 ( p8 - 8 - 4 ) 

Then, the state equation can be written as 




0 1 0 ' 

Xl(t) 

Xi(t) 

= 

0 0 1 

X 2 (t) 

*3 (0 _ 


*0 — flo *1 — «t *2 — a 2 

_*3 (t)_ 


If the parameters of the original system matrix are a 0 = 1, a x = — 2, and 
a 2 = 3, what are the values of the gain matrix K = [* 0 *t * 2 ] you 
will fix so that the virtual system matrix in the state equation (P8.8.5) 
has the eigenvalues of k = — 1, —2, and —3? Note that the character¬ 
istic equation of the system whose behavior is described by the state 
equation (P8.8.5) is 

s 3 + (02 - K 2 )s 2 + (oi - K x )s +a o -K o = 0 (P8.8.6) 

and the equation having the roots of k = —1, —2, and —3 is 

(.v + 1)0 + 2)0 + 3) = i 3 + 6s 2 + 11s + 6 = 0 (P8.8.7) 

8.9 A Homogeneous Differential Equation — An Eigenvalue Equation 

Consider the undamped mass-spring system depicted in Fig. 8.3, where the 
masses and the spring constants are m 1 = 1, m 2 = 1 [kg] and k\ = 5, k 2 = 10 
[N/m], respectively. Complete the following program “nm8p09.m” whose 
objective is to solve the second-order differential equation (8.6.1) with the 
initial conditions Oi(0), x 2 (0), *j(0), * 2 ( 0 )] = [1, -0.5, 0, 0] for the time 
interval [0,10] in two ways—that is, by using the ODE-solver “ode45()” 
(Section 6.5.1) and by using the eigenvalue method (Section 8.6) and plot 
the two solutions. Run the completed program to obtain the solution graphs 
for x\ (t) and x 2 (t). 

(cf) Note that the second-order vector differential equation (8.6.1) can be written as 
the following state equation: 



(P8.9.1) 
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%nm8p09.m solve a set of differential eqs. (a state equation) 
clear, elf 
global A 
df = 'df861'; 

kl = 5; k2 = 10; ml = 1; m2 = 1; % the spring constants and the masi 
A = [(kl + k2)/m1 -k2/m1; -k2/m2 k2/m2]; NA = size(A,2); 
tO = 0; tf =??; xO =[? ???? ? ?]; % initial/final time, initial vali 
[t4,x4] = ode45(df,[to tf],x0); 

[V,LAMBDA] = eig(A); % modal matrix composed of eigenvectors 
wO = xO(1:NA)*V; wlO = xO(NA+1:end)*V; % Eq.(8.6.8) 
omega = ??????????????????; 
for n = 1:NA % Eq.(8.6-7) 
omegan=omega(n); 

w(:,n) = [cos(omega n;*t4) sin(omega n*t4)]*[wO(n);w10(n)/omega n 

xE = w*V.'; % Eq.(8.6.3) 
for n = 1:NA 

subplot(311 + n), plot(t4,x4(:,n),'b', t4,xE(:,n), 1 r 1 ) 


function dx = df861(t,x) 

global A 

NA = size(A,2); 

if length(x) -= 2*NA, error('Some dimension problem'); end 
dx = [zeros(NA) eye(NA); -A zeros(NA)]*x(:); 








PARTIAL DIFFERENTIAL 
EQUATIONS 


What is a partial differential equation (PDE)? It is a class of differential equations 
involving more than one independent variable. In this chapter, we consider a gen¬ 
eral second-order PDE in two independent variables x and y, which is written as 


d 2 u 3 2 u 3 2 u ( 

A (x , y) - + B ( ,,y ) — + C(x ,a- = f( x , 

for x 0 < * < x f , y 0 < y < y f 


3 u 3 u\ 
dx’ dy ) 


(9.0.1) 


with the boundary conditions given by 


u(x, y 0 ) = by 0 (x), u(x, y f ) = b yf (x), 
u(x 0 , y ) = b x0 (y), and u(x f , y) = b xf (y) 


(9.0.2) 


These PDEs are classified into three groups: 

Elliptic PDE: if B 2 - 4AC < 0 
Parabolic PDE: if B 2 - 4AC = 0 
Hyperbolic PDE: if B 2 - 4AC > 0 


These three types of PDE are associated with equilibrium states, diffusion states, 
and oscillating systems, respectively. We will study some numerical methods for 
solving these PDEs, since their analytical solutions are usually difficult to find. 


Applied Numerical Methods Using MATLAB ®, by Yang, Cao, Chung, and Morris 
Copyright © 2005 John Wiley & Sons, Inc., ISBN 0-471-69833-4 
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9.1 ELLIPTIC PDE 


As an example, we will deal with a special type of elliptic equation called 
Helmholtz’s equation, which is written as 


d 2 u(x, y ) 
dx 2 


W 2 u(x, y) + g(x, y)u(x, y) = 


(9.1.1) 


over a domain D = {(x, y)|xo < x < x/,yo < y < y/} with some boundary con¬ 
ditions of 

u(x 0 , y) = b x0 (y), u(x f ,y) = b xf (y), (9 12) 

u(x,y 0 ) = b y0 (x), and u(x, y f ) = b yf (x) 


(cf) Equation (9.1.1) is called Poisson’s equation if g(x, y) = 0 and it is called Laplace’s 
equation if g(x, y) = 0 and f(x, y) = 0. 


To apply the difference method, we divide the domain into M x sections, each 
of length Ax = (xj — xq)/M x along the x-axis and into M y sections, each of 
length Ay = (yy — yo)/M y along the y-axis, respectively, and then replace the 
second derivatives by the three-point central difference approximation (5.3.1) 

with Xj = X() + j Ax, J4 = yo + iAy 
(9.1.3a) 

vith Uij = u (xj , y,) (9.1.3b) 


d 2 u(x, y) I _ u iJ+ 1 - 2 uij + Uij-i 
9x2 _ A;c2 

3 2 u(x, y) I ^ Ui+i,j ~ 2 Uij + Uj-ij 
3y 2 Ay 2 


so that, for every interior point (xj , y,-) with 1 < i < M y — 1 and 1 < j < M x — 1, 
we obtain the finite difference equation 


Ujj+i - 2Uij + u uj -1 

Ax 2 


- 2 Uij + Uj-ij 

Ay 2 


+ SijUi.j = fi.j (9-1.4) 


where 

uij = u(x jy y,), fij = f(xj, y t ), and g t j = g(x h y ( ) 


These equations can somehow be arranged into a system of simultaneous 
equations with respect to the (M y — 1 )(M X — 1) variables {wi,i, u\j ,..., u\ m x -u 
«2,i, • • • - U2,M X -1, ■ ■ ■, UM y -i,i, but lt seems to be 

messy to work with and we may be really in trouble as M x and M y become 
large. A simpler way is to use the iterative methods introduced in Section 2.5. 
To do so, we first need to shape the equations and the boundary conditions into 
the following form: 


Uij = 


,(uij+\ + Ujj- 1 ) + r x (u i+ ij + Ui-ij) + r xy (gijUij — fj ) (9.1.5a) 
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«{,0 = ho (yd, Ui,M* = b xf{yd, M 0 J = b y0 (xj), u MyJ = b yf (xj) (9.1.5b) 

where 


Ay 2 

2(Ax 2 + Ay 2 ) 


Ax 2 

2(Ax 2 + Ay 2 ) 


Ax 2 Ay 2 

2(Ax 2 + Ay 2 ) ~ Vxy 


(9.1.6) 


How do we initialize this algorithm? If we have no priori knowledge about the 
solution, it is reasonable to take the average value of the boundary values as the 
initial values of Ujj. 

The objective of the MATLAB routine “poisson.m” is to solve the above 
equation. 
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Example 9.1. Laplace’s Equation—Steady-State Temperature Distribution. 
Consider Laplace’s equation 

- 3 2 u(x, y) 3 2 u(x, y) 

V 2 u(x, y ) = —= 0 for 0 < x < 4, 0 < y < 4 
dx 1 3 y L 

(E9.1.1) 

with the boundary conditions 

w(0, y) = e y — cosy, m(4, y) = e y cos4 — e 4 cosy (E9.1.2) 

u(x, 0) = cost - e*, u(x, 4) = e 4 cosx - e x cos4 (E9.1.3) 

What we will get from solving this equation is u(x,y), which supposedly 
describes the temperature distribution over a square plate having each side 4 
units long (Fig. 9.1). We made the MATLAB program “solve_poisson.m” in 
order to use the routine “poisson()” to solve Laplace’s equation given above 
and run this program to obtain the result shown in Fig. 9.2. 

Now, let us consider the so-called Neumann boundary conditions described as 

3«(x, y) _ A (y) for x = xq (the left-side boundary) (9.1.7) 
3x 0 


Dirichlet-type boundary condition (function value fixed) ^ 

/ o h _ o b Jl _dt 

\—■-4-4—jf—i*—4 

v ie oh _j_!_| vi/ v i/ 

T i j j t 

Yl - 

°*b *i |*-Ax-*| Xj f 

Neumann-type boundary condition (derivative fixed) 

Figure 9.1 The grid for elliptic equations with Dirichlet/Neumann-type boundary condition. 
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Replacing the first derivative on the left-side boundary ( x = xq) by its three-point 
central difference approximation (5.1.8) 

UlX 2 a' 1 ^ ^ «u — 2^ o (y i )Av for i = 1, 2,..., M y — 1 

(9.1.8) 

and then substituting this constraint into Eq. (9.1.5a) at the boundary points, we 
have 


Ui, o — fy(Mi, 1 + «f,-t) + r *(M;+l ,0 + n«-l,o) + r xy(gi, 0 Ui ,0 ~ fi,0 ) 

= r y {ui, 1 + K;,i - 2b' Xo (yi)Ax) + f x (u i+ 1, 0 + «,-i >0 ) + fxy(gi,oUi,o ~ fi,o) 

= 2r y u iA + r x (u i+h0 + i,o) + rxy(gi,o u i,o ~ fifi ~ 2 ^ 0 (y;)/ A ^) 

for i = 1,2,..., M-y — 1 (9.1.9) 

If the boundary condition on the lower side boundary (y = y 0 ) is also of 
Neumann type, then we need to write similar equations for j = 1, 2,..., M x — 1 

u o,j = r y(Mo,j+i + u o,j -\) + 2r x iiij + r xy (go jUqj — foj — 2b' yo (xj) / Ay) 

(9.1.10) 

and additionally for the left-lower comer point (xq, y 0 ), 

«o,o = 2(r y wo,i + ^ni, 0 ) + r xy (g 0 , 0 u 0 ,o ~ /o,o - 2(b xo (y 0 )/Ax + 2b yo (x 0 )/Ay )) 

(9.1.11) 
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9.2 PARABOLIC PDE 


An example of a parabolic PDE is a one-dimensional heat equation describing 
the temperature distribution u(x,t ) (x is position, t is time) as 


„ d 2 u(x, t ) du(x, t ) 

dx 2 ~ dt 


for 0 < x < x f. 


0 <t<T 


(9.2.1) 


In order for this equation to be solvable, the boundary conditions u{ 0, t) = 
bo(t ) & u(Xf, t) = b x f(t) as well as the initial condition u(x, 0) = io(x) should 
be provided. 


9.2.1 The Explicit Forward Euler Method 

To apply the finite difference method, we divide the spatial domain [0, xf\ into 
M sections, each of length Ax = Xf/M, and divide the time domain [0, T J into 
N segments, each of duration At = T/N, and then replace the second partial 
derivative on the left-hand side and the first partial derivative on the right-hand 
side of the above equation (9.2.1) by the central difference approximation (5.3.1) 
and the forward difference approximation (5.1.4), respectively, so that we have 


(9.2.2) 


This can be cast into the following algorithm, called the explicit forward Euler 
method, which is to be solved iteratively: 


M f +1 =r{u k i+l +u k i _ x ) + {\-2r)u k i with r = A-=- (9.2.3) 

T Ax 2 

for / = 1, 2,..., M — 1 

To find the stability condition of this algorithm, we substitute a trial solution 
u f = A k e j,7t/p (P is any nonzero integer) (9.2.4) 


into Eq. (9.2.3) to get 

A = r(e jn/p + e~ in/p ) + (1 - 2r) = I - 2r(l - cos (n/P)) (9.2.5) 

Since we must have |A| < 1 for nondivergence, the stability condition turns out 
to be 


(9.2.6) 



PARABOLIC PDE 407 


function [u,x,t] = heat_exp(a,xf,T,itO 

, bx0,bxf,M,N) 

%solve a u_xx = u_t for 0 <= x <= xf, 

0 <= t <= T 

% Initial Condition: u(x,0) = itO(x) 

% Boundary Condition: u(0,t) = bxO(t), 
% M = # of subintervals along x axis 
% N = # of subintervals along t axis 
dx = xf/M; x = [0:M]'*dx; 
dt = T/N; t = [ 0: N ] *dt; 

u(xf,t) = bxf(t) 

for i = 1:M + 1, u(i,1) = itO(x(i)); end 

for n = 1:N + 1, u([1 M + l],n) = [bx0(t(n)); bxf(t(n))]; end 

r = a*dt/dx/dx, rl = 1 - 2*r; 
for k = 1 :N 


for i = 2:M 


u(i,k+1) = r*(u(i + 1,k) + u(i -1 

,k)) + r1*u(i,k); %Eq.(9.2.3) 

end 


end 



This implies that as we decrease the spatial interval Ax for better accuracy, we 
must also decrease the time step At at the cost of more computations in order 
not to lose the stability. 

The MATLAB routine “heat_exp()” has been composed to implement this 
algorithm. 

9.2.2 The Implicit Backward Euler Method 

In this section, we consider another algorithm called the implicit backward Euler 
method, which comes out from substituting the backward difference approxima¬ 
tion (5.1.6) for the first partial derivative on the right-hand side of Eq. (9.2.1) as 

M f + i — 2.u k + u k _ ] u k — u\~ x 
Ax 2 ~~ At 

, , , At 

—rUf_i + (1 + 2r)Uj - ru i+l = u* with r = A-^- 

for i = 1, 2,..., M - 1 

If the values of u k 0 and u k M at both end points are given from the Dirichlet 
type of boundary condition, then the above equation will be cast into a system 
of simultaneous equations: 

-1 + 2r -r 0 

-r 1 + 2r —r 
0 -r 1 + 2r 

0 0 0 

0 0 0 

(9.2.9) 


0 

0 ' 


U\ 


U 1 + ru 0 

0 

0 


u 2 


u\ 1 

0 

0 


U 3 

= 

u k f l 

■ 1 + 2r 

1 + 2r _ 


U M -2 

_u k M _i _ 


U M-2 

u k ,7 x , + ru\. 


(9.2.7) 

(9.2.8) 
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How about the case where the values of du/‘dx\ x= { } = b' 0 (t ) at one end are 
given? In that case, we approximate this Neumann type of boundary condition by 

u\ — u k , 

2Ax = b' 0 (k ) (9.2.10) 

and mix it up with one more equation associated with the unknown variable Uq 


-ru k _ x + (1 + 2r)u\ - ru\ = u*" 1 (9.2.11) 


to get 

(1 + 2r)u\ — 2ru\ = Mq -1 — 2rb' 0 (k)Ax (9.2.12) 

We augment Eq. (9.2.9) with this to write 


'id 2r 

—2 r 

0 

0 

0 

0 ' 


■ u k 0 - 


~u k 0 _1 -2rb' 0 (k)Ax~ 

|' 

l + 2r 

-r 

0 

0 

0 


u,\ 


u\~ l 

0 

-r 

1 + It 

r 

0 

0 


u\ 


«2 _ ‘ 

0 

0 

~ r 

1 t 2r • 

0 

0 


u\ 

= 

“3" 1 

0 

0 

0 


1 +2r 



U k M _ 2 


u k - 1 

0 

0 

0 



1 +2r_ 


- U M~ 1- 


«M-1 + ru M 


(9.2.13) 

Equations such as Eq. (9.2.9) or (9.2.13) are really nice in the sense that they 
can be solved very efficiently by exploiting their tridiagonal structures and are 
guaranteed to be stable owing to their diagonal dominancy. The unconditional 
stability of Eq. (9.2.9) can be shown by substituting Eq. (9.2.4) into Eq. (9.2.8): 


p + (1 + 2r) - re ]n 


-- 1/A., A = 


1 


1*1 < 1 


1 + 2r (1 — cos(n/P)) ’ 

(9.2.14) 

The following routine “heat_imp()” implements this algorithm to solve the 
PDE (9.2.1) with the ordinary (Dirichlet type of) boundary condition via Eq. (9.2.9). 


function [u,x,t] = heat_imp(a,xf,T,itO,bxO,bxf,M,N) 

%solve a u_xx = u_t for 0 <= x <= xf, 0 <= t <= T 
% Initial Condition: u(x,0) = itO(x) 

% Boundary Condition: u(0,t) = bxO(t), u(xf,t) = bxf(t) 

% M = # of subintervals along x axis 

% N = # of subintervals along t axis 

dx = xf/M; x = [0:M]'*dx; 

dt = T/N; t = [0:N]*dt; 

for i = 1:M + 1, u(i,1) = it0(x(i)); end 

for n = 1:N + 1, u([1 M + 1 ] , n) = [bxO(t(n)); bxf(t(n))]; end 

r = a*dt/dx/dx; r2 = 1 + 2*r; 

for i = 1:M - 1 

A(i,i) = r2; %Eq.(9.2.9) 

if i > 1, A(i - 1,i) = -r; A(i,i - 1) = -r; end 
for k = 2:N + 1 

b = [r*u(1,k); zeros(M - 3,1); r*u(M + 1,k)] + u(2:M,k - 1); %Eq.(9.2.9) 
u(2:M,k) = trid(A,b); 
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9.2.3 The Crank-Nicholson Method 

Here, let us go back to see Eq. (9.2.7) and try to improve the implicit backward 
Euler method. The difference approximation on the left-hand side is taken at 
time point k, while the difference approximation on the right-hand side is taken 
at the midpoint between time k and k — 1, if we regard it as the central differ¬ 
ence approximation with time step Af/2. Doesn’t this seem to be inconsistent? 
How about taking the difference approximation of both sides at the same time 
point—say, the midpoint between k + 1 and k —for balance? In order to do so, 
we take the average of the central difference approximations of the left-hand side 
at the two points k + 1 and k, yielding 



which leads to the so-called Crank-Nicholson method: 

—ru^l + 2(1 + r)uf +1 — ruf+j = ruf +1 + 2(1 — r)u\ + ru\_ x (9.2.16) 

.At 

with r = A -- 

Ax 1 


With the Dirichlet/Neumann type of boundary condition on xq/xm, respec¬ 
tively, this can be cast into the following tridiagonal system of equations. 



(9.2.17) 

This system of equations can also be solved very efficiently, and its uncondi¬ 
tional stability can be shown by substituting Eq. (9.2.4) into Eq. (9.2.16): 


2A.(1 + r(l - cos(7r/P))) = 2(1 - r(l - cos( tt/P))), 
l-rq-c^/TO 
1 + r(l — cos(7r/P)) 


(9.2.18) 


This algorithm is cast into the following MATLAB routine “heat_CN()”. 



410 PARTIAL Dl FFERENTIAL EQUATIONS 


function [u,x,t] = heat_CN(a,xf,T,itO,bxO,bxf,M,N) 

%solve a u_xx = u_t for 0 <= x <= xf, 0 <= t <= T 
% Initial Condition: u(x,0) = itO(x) 

% Boundary Condition: u(0,t) = bxO(t), u(xf,t) = bxf(t) 

% M = # of subintervals along x axis 

% N = # of subintervals along t axis 

dx = xf/M; x = [0:M]'*dx; 

dt = T/N; t = [0:N]*dt; 

for i = 1:M + 1, u(i,1) = itO(x(i)); end 

for n = 1:N + 1, u([1 M + 1],n) = [bxO(t(n)); bxf(t(n))]; end 

r = a*dt/dx/dx; 

rl = 2*(1 - r); r2 = 2*(1 + r); 
for i = 1:M - 1 

A(i,i) = rl; %Eq.(9.2.17) 

if i > 1, A(i - 1,i) = -r; A(i,i - 1) = -r; end 
end 

for k = 2:N + 1 

b = [r*u(1,k); zeros(M - 3,1); r*u(M + 1,k)] ... 

+ r*(u(1:M - 1,k - 1) + u(3:M + 1,k - 1)) + r2*u(2:M,k - 1); 
u(2:M,k) = trid(A,b); %Eq.(9.2.17) 
end 


Example 9.2. One-Dimensional Parabolic PDE: Heat Flow Equation. 
Consider the parabolic PDE 


3 2 u(x, t ) 3 u(x, t) 

dx 2 dt 


for 0 < x < 1, 0 < t < 0.1 


with the initial condition and the boundary conditions 

u(x, 0) = sinnx, m(0, t) = 0, n(l,#) = 0 (E9.2.2) 


We made the MATLAB program “solve_heat. m” in order to use the routines 
“heat_exp()”, “heat_imp()”, and “heat_CN()” in solving this equation and ran 
this program to obtain the results shown in Fig. 9.3. Note that with the spatial 
interval Ax = Xf/M = 1/20 and the time step A t = T/N = 0.1/100 = 0.001, 
we have 


At _ 0.001 

A ~Ax 2 ~ l (\/2Q) 2 


0.4 


(E9.2.3) 


which satisfies the stability condition (r < 1 /2) (9.2.6) and all of the three meth¬ 
ods lead to reasonably fair results with a relative error of about 0.013. But, 
if we decrease the spatial interval to Ax = 1/25 for better resolution, we have 
r = 0.625, violating the stability condition and the explicit forward Euler method 
(“heat_exp()”) blows up because of instability as shown in Fig. 9.3a, while 
the implicit backward Euler method (“heat_imp()”) and the Crank-Nicholson 
method (“heat_CN()”) work quite well as shown in Figs. 9.3b,c. Now, with the 
spatial interval Ax = 1/25 and the time step At = 0.1/120, the explicit method 
as well as the other ones works well with a relative error less than 0.001 in return 
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for somewhat (30%) more computations, despite that r = 0.5208 doesn’t strictly 
satisfy the stability condition. 

This implies that the condition (r < 1 /2) for stability of the explicit forward 
Euler method is not a necessary one, but only a sufficient one. Besides, if it 
converges, its accuracy may be better than that of the implicit backward Euler 
method, but generally no better than that of the Crank-Nicholson method. 


%solve_heat 

a = 1; %the parameter of (E9.2.1) 

itO = inline('sin(pi*x)x'); %initial condition 

bxO = inline('O'); bxf = inline('O'); %boundary condition 

xf = 1; M = 25; T = 0.1; N = 100; %r = 0.625 

%analytical solution 

uo = inline)'sin(pi*x)*exp(-pi*pi*t)','x','t'); 

[u1,x,t] = heat_exp(a,xf,T,itO,bxO,bxf,M,N); 
figure(l), elf, mesh(t,x,u1) 

[u2,x,t] = heat_imp(a,xf,T,itO,bxO,bxf,M,N); %converge unconditionally 
figure(2), elf, mesh(t,x,u2) 

[u3,x,t] = heat_CN(a,xf,T,itO,bxO,bxf,M,N); %converge unconditionally 
figure(3), elf, mesh(t,x,u3) 

MN = M*N; 

Uo = uo(x,t); aUo = abs(Uo)+eps; %values of true analytical solution 

%How far from the analytical solution? 

errl = norm)(ul-Uo)./aUo)/MN 

err2 = norm((u2-Uo)./aUo)/MN 

err3 = norm)(u3-Uo)./aUo)/MN 
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9.2.4 Two-Dimensional Parabolic PDE 

Another example of a parabolic PDE is a two-dimensional heat equation describ¬ 
ing the temperature distribution u(x, y, t)((x, y ) is position, t is time) as 


/3 2 u(x, y, t) 3 2 u(x, y,t)\ _ 3 u(x, y, t ) 

V dx 2 + dy 2 ) = 3* 

for xq < x < Xf, yo < y < y/, 0 < t < T 


(9.2.19) 


In order for this equation to be solvable, we should be provided with the boundary 
conditions 


u(x 0 , y, t) = b x0 (y, t), u(x f , y, t) = b xf {y, t ), 
u(x, yo, t) = b y o(x, t), and u(x, y f , t) = b yf (x, t) 

as well as the initial condition u(x, y, 0) = i 0 (x, y). 

We replace the first-order time derivative on the right-hand side by the three- 
point central difference at the midpoint (4 +1 + 4)/2 just as with the Crank- 
Nicholson method. We also replace one of the second-order derivatives, u xx and 
Uyy , by the three-point central difference approximation (5.3.1) at time 4 and the 
other at time 4+i, yielding 


At 

(9.2.20) 

which seems to be attractive, since it can be formulated into a tridiagonal system 
of equations with respect to u k ^\ ■, m ^ 1 , and u k +\ ■. But, why do we treat u xx 
and Uyy with discrimination—that is, evaluate one at time 4 and the other at time 
4 +i in a fixed manner? In an alternate manner, we write the difference equation 
for the next time point 4+1 as 

ufih - 2u k+ j l + i u k i+X j - 2u\j + u\_ X j \ u k f - wft 1 

A^2 + ~Ay 2, ) = At 

(9.2.21) 

This formulation, proposed by Peaceman and Rachford [P-1], is referred to as the 
alternating direction implicit (ADI) method and can be cast into the following 
algorithm: 

~ r yi uk i-\,j + + (1 + 2r y )M^j 1 = r x (u k j_ x + u k j +l ) + (1 — 2 r x )u k j 

for 1 < j < M x - 1 (9.2.22a) 

-r x (u\f-i + «fj+i) + (1 + 2 r x )u\f = ^(w**^ + u k +l j) + (1 - 2r v ) M f+ 1 
for 1 < i < M y - 1 (9.2.22b) 


y Ax 2 Ay 2 J 
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th 

r x = AAt/Ax 2 , r y = AAt/Ay 2 , 

Ax — (xf — x 0 )/M x , Ay = (y f — y 0 )/M y , At = T/N 

The objective of the following MATLAB routine “heat2_ADI ()” is to 
int this algorithm for solving a two-dimensional heat equation (9.2.19) 


motion [u,x,y,t] = heat2_ADI(a,D,T,ixyO,bxyt,Mx,My,N) 

olve u_t = c(u_xx + u_yy) for D(1) <= x <= D(2), D(3) <= y <= D(4), 0 <= t ■ 
Initial Condition: u(x,y,0) = ixyO(x,y) 

Boundary Condition: u(x,y,t) = bxyt(x,y,t) for (x,y)cB 
Mx/My = # of subintervals along x/y axis 
N = # of subintervals along t axis 
= (D(2) - D(1))/Mx; x = D(1)+[0:Mx]*dx; 

= (D(4) - D(3))/My; y = D(3)+[0:My] 1 *dy; 

= T/N; t = [0:N]*dt; 
nitialization 
ir j = 1 :Mx + 1 
for i = 1:My + 1 

u(ij j) = ixyO(x(j),y(i)); 


= a*dt/(dx*dx); rxl = 1 + 2*rx; rx2 = 1 - 2*rx; 

1 = a*dt/(dy*dy); ryl = 1 + 2*ry; ry2 = 1 - 2*ry; 
r j = 1 :Mx - 1 %Eq.(9.2.22a) 

Ay(j,j) = ryl; 

if j > 1, Ay(j - 1,j) = -ry; Ay(j,j-1) = -ry; end 
id 

r i = 1 :My - 1 %Eq. (9.2.22b) 


if i > 1, Ax(i - 1,i) = -rx; Ax(i,i - 1) = -rx; end 


for i = 1:My + 1 %Boundary condition 
u(i,1) = feval(bxyt,x(1),y(i),t); 
u(i,Mx+1) = feval(bxyt,x(Mx+1),y(i),t); 
end 

for j = 1:Mx + 1 

u(1, j) = feval(bxyt,x(j),y(1),t); 
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3 

0 0 

Figure 9.4 A solution for a two dimensional parabolic PDE obtained using “heat2_ADI () ” 
(Example 9.3). 


Example 9.3. A Parabolic PDE: Two-Dimensional Temperature Diffusion. 

Consider a two-dimensional parabolic PDE 

4 fd 2 u(x,y,t) d 2 u(x,y,t)\_du(x,y,t) 

\ dx 2 + dy 2 ) dt 

for 0 < x < 4, 0 < y < 4, 0 < t < 5000 

with the initial conditions and boundary conditions 
M(jc,y,0) = 0 for t = 0 

u(x, y, t ) = e y cos* — e x cosy for x = 0, x = 4, y = 0, y = 4 

We made the following MATLAB program “solve_heat2. m” in order to use 
the routine “heat2_ADI ()” to solve this equation and ran this program to get the 
result shown in Fig. 9.4 at the final time. 


(E9.3.1) 


(E9.3.2a) 

(E9.3.2b) 


%solve_heat2 
clear, elf 
a = 1e-4; 

itO = inline( 1 0 1 , 1 x 1 , 1 y 1 ); %(E9.3.2a) 

bxyt = inline( 1 exp(y)*cos(x)-exp(x)*cos(y) 1 , 1 x 1 , 1 y 1 , 1 t 1 ); %(E9.3.2b) 
D = [0404]; T = 5000; Mx = 40; My = 40; N = 50; 

[u,x,y,t] = heat2_ADI(a,D,T,itO,bxyt,Mx,My,N); 
mesh(x,y,u) 


9.3 HYPERBOLIC PDE 

An example of a hyperbolic PDE is a one-dimensional wave equation for the 
amplitude function u(x, t)(x is position, t is time) as 

d 2 u(x, t ) d 2 u(x, t) 

dx 2 ~ dt 2 


for 0 < x < Xf, 0 < t <T 


(9.3.1) 
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In order for this equation to be solvable, the boundary conditions m(0, t) = 
bo(t) and u(xf,t) = b x f(t ) as well as the initial conditions n(x, 0) = io(x) and 
du/dt\ l= o(x, 0) = i' 0 (x) should be provided. 


9.3.1 The Explicit Central Difference Method 

In the same way as with the parabolic PDEs, we replace the second derivatives 
on both sides of Eq. (9.3.1) by their three-point central difference approximation 
(5.3.1) as 




- + K*_J u* 4 

Ax 2 = 


-2m*. 


At 2 


with Ax = At = — (9.3.2) 


which leads to the explicit central difference method: 

M* +1 = r( M * +1 + m*_ j ) + 2(1 - r)u* - M*- 1 with r 


At 2 

1-; 

Ax- 


A —j (9.3.3) 


Since n ; 1 = n(x,-, — At) is not given, we cannot get u) directly from this 
formula (9.3.3) with k = 0: 

u) = r(u° +1 + m°_i) + 2(1 - r)M° - uT l (9.3.4) 


Therefore, we approximate the initial condition on the derivative by the central 
difference as 


= h)(Xi) 


(9.3.5) 


and make use of this to remove n ; 1 from Eq. (9.3.3): 


u) = r(«9 +1 + m?_ j) + 2(1 - r) M ° - (u) - 2i'(x,)Ar) 

u] = ^r(M? +1 + m°_ j) + (1 - r)M° + i' 0 (xi)At (9.3.6) 


We use Eq. (9.3.6) together with the initial conditions to get u] and then go 
on with Eq. (9.3.3) for k — 1,2,_Note the following facts: 


• We must have r < 1 to guarantee the stability. 

• The accuracy of the solution gets better as r becomes larger so that Ax 
decreases. 


It is therefore reasonable to select r = 1. 

The stability condition can be obtained by substituting Eq. (9.2.4) into 
Eq. (9.3.3) and applying the Jury test [P-3]: 

X = 2r cos(7T/P) + 2(1 - r) - X~\ X 1 + 2(r(l - cos(tt /P)) - 1)X + 1 = 0 
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We need the solution of this equation to be inside the unit circle for stability, 
which requires 


1 

— 1 — cos(n/P) 


(9.3.7) 


The objective of the following MATLAB routine “wave () ” is to implement 
this algorithm for solving a one-dimensional wave equation. 


Example 9.4. A Hyperbolic PDE: One-Dimensional Wave (Vibration). Consider 
a one-dimensional hyperbolic PDE 


d 2 u(x, t ) 
dx 2 


for 0 < 


<2, 0 < y < 2, 


with the initial conditions and boundary conditions 


< 2 (E9.4.1) 


m(jc,0)=jc(1-jc), du/dt(x, 0) = 0 for t = 0 (E9.4.2a) 

«((), t) = 0 for jc = 0, u( 1,0 = 0 for x = 1 (E9.4.2b) 


We made the following MATLAB program “solve_wave.m” in order to use 
the routine “wave () ” to solve this equation and ran this program to get the result 
shown in Fig. 9.5 and see a dynamic picture. 


function [u,x,t] = wave(a,xf,T,it0,i1t0,bx0,bxf,M,N) 

%solve a u_xx = u_tt for 0<=x<=xf, 0<=t<=T 
% Initial Condition: u(x,0) = itO(x), u_t(x,0) = iltO(x) 

% Boundary Condition: u(0,t)= bx0(t), u(xf,t) = bxf(t) 

% M = # of subintervals along x axis 

% N = # of subintervals along t axis 

dx = xf/M; x = [0:M] 1 *dx; 

dt = T/N; t = [0 :N]*dt; 

for i = 1:M + 1, u(i,1) = itO(x(i)); end 

for k = 1:N + 1 

u([1 M + 1],k) = [bxO(t(k)); bxf(t(k))]; 
r = a*(dt/dx)“ 2; rl = r/2; r2 = 2*(1 - r); 

u(2:M,2) = r1*u(1:M - 1,1) + (1 - r)*u(2:M,1) + r1*u(3:M + 1,1) ... 

+ dt*i1tO(x(2:M)); %Eq.(9.3.6) 

for k = 3:N + 1 

u(2:M,k) = r*u(1:M - 1,k - 1) + r2*u(2:M,k-1) + r*u(3:M + 1,k - 1)... 
- u(2:M,k - 2); %Eq.(9.3.3) 


%solve_wave 

itO =’inline( , x.*(1-x)','x l ); iltO = inline('O'); %(E9.4.2a) 
bxOt = inline( 1 0 1 ); bxft = inline(’O'); %(E9.4.2b) 
xf = 1; M = 20 ; T = 2; N = 50; 

[u,x,t] = wave(a,xf,T,itO,ilto,bxOt,bxft,M,N); 
figure(l), elf 
mesh(t,x,u) 
figure(2), elf 

for n = 1:N %dynamic picture 

plot(x,u(:,n)), axis([0 xf -0.3 0.3]), pause(0.2) 
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9.3.2 Two-Dimensional Hyperbolic PDE 

In this section, we consider a two-dimensional wave equation for the amplitude 
function u(x, y, t ) ((x, y) is position, t is time) as 

„ fd 2 u(x, y, t ) , d 2 u(x, y,t)\_ d 2 u(x, t) 

A \ dx 2 + a? 

for 0 < x < Xf, 0 < y < y/, 0 < t < T 

In order for this equation to be solvable, we should be provided with the boundary 
conditions 


w(0, y, t) = b x0 (y, t ), u(x f , y, t ) = b xf (y, t), 

u(x, 0, t) = b y o(x, t), and u(x, y/,t) = b y f(x,t) 

as well as the initial condition u(x, y, 0) = io(x, y) and du/dt\ t=0 (x, y,0) = 


In the same way as with the one-dimensional case, we replace the sec¬ 
ond derivatives on both sides by their three-point central difference approxi¬ 
mation (5.3.1) as 


K j+l - Hj + <J -1 . «f + u - Kj+ «ti A 

\ Ax 2 + Ay 2 ) 


-2^ + nf- 1 


with Ax = — , Ay = —, At = 


which leads to the explicit central difference method: 

u )j X = r x( u Uj +1 + u i,j- 1 ) + — r x — r y)u k iJ + r y(,U k +l j + nf_j j) — U^j 1 

(9.3.10) 

, At 2 , At 2 


Ax 2 ’ 


Ay 2 


with 
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Since m ( j = u(xj, y\, — At) is not given, we cannot get u] ■ directly from this 
formula (9.3.10) with k = 0: 

u] j = r x(u°i j+\ + u °i j- 1 ) + 2(1 — r x — r y )M° ■ + r y {u° i+x ■ + ; .) — u t j 

(9.3.11) 

Therefore, we approximate the initial condition on the derivative by the central 
difference as 


2A t 


■■ i'oixj, yd 


(9.3.12) 


and make use of this to remove n ( j from Eq. (9.3.11) to have 

ulj = ±K(«°, +1 + ulj_ x ) + r y (u% h] + ul XJ )} 

+ 2(1 - r* - r y )ulj + i' 0 ( Xj , yd At (9.3.13) 


We use this Eq. (9.3.13) together with the initial conditions to get u\ j and then 

go on using Eq. (9.3.10) for k = 1,2,_A sufficient condition for stability [S-l, 

Section 9.6] is 


4AAt 2 

r =- < 1 

Ax 1 + Ay 2 ~ 


(9.3.14) 


The objective of the MATLAB routine “wave2()” is to implement this algo¬ 
rithm for solving a two-dimensional wave equation. 

Example 9.5. A Hyperbolic PDE: Two-Dimensional Wave (Vibration) Over a 
Square Membrane. Consider a two-dimensional hyperbolic PDE 

1 / 3 2 u(x, y, t ) 3 2 u(x, y, t) \ _ 3 u 2 (x, y, t) 

4 ^ dx 2 + V J = 3 ? 

for 0 < x < 2, 0 < y < 2 and 0<r<2 (E9.5.1) 


with the zero boundary conditions and the initial conditions 

n(0, y, t) = 0, n(2, y, t) = 0, u(x,0,t) = 0, u(x,2,t)=0 (E9.5.2) 

u(x, y,0) = 0.1sin(7Tx)sin(7ry/2), du/dt(x, y, 0) = 0 for t = 0 (E9.5.3) 


We made the following MATLAB program “solve_wave2. m” in order to use 
the routine “wave2 () ” for solving this equation and ran this program to get the 
result shown in Fig. 9.6 and see a dynamic picture. Note that we can be sure of 
stability, since we have 

4AAt 2 4(1/4) (2/20) 2 _1 

Ax 2 + Ay 2 (2/20) 2 + (2/20) 2 2 ~ 
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9.4 FINITE ELEMENT METHOD (FEM) FOR SOLVING PDE 

The FEM method is another procedure used in finding approximate numerical 
solutions to BVPs/PDEs. It can handle irregular boundaries in the same way as 
regular boundaries [R-l, S-2, Z-l], It consists of the following steps to solve the 
elliptic PDE: 


d 2 u(x, y) 
dx 2 


d 2 u(x, y ) 

a ? 2 


+ g(x,y)u(x,y) = f(x,y ) 


(9.4.1) 


for the domain D enclosed by the boundary B on which the boundary condition 
is given as 

u(x, y) = b(x, y) on the boundary B (9.4.2) 

1. Discretize the (two-dimensional) domain D into, say, N s subregions 
{Si, S 2 ,..., Sn s ] such as triangular elements, neither necessarily of the 
same size nor necessarily covering the entire domain completely and 
exactly. 

2. Specify the positions of N„ nodes and number them starting from the 
boundary nodes, say, n = 1,..., N b , and then the interior nodes, say, n = 
N b +l,...,N n . 

3. Define the basis/shape/interpolation functions 


<j) n (x, y) = for s = 1,..., N s ] V (x, y) e D (9.4.3a) 

<Pn,s(x, y) = + p„, s (2)x + p„, s (3)y 

for each subregion S s (9.4.3b) 

collectively for all subregions s = 1 : N s and for each node n = 1 : N n , so 

that tpn is 1 only at node n, and 0 at all other nodes. Then, the approxi¬ 
mate solution of the PDE is a linear combination of basis functions 
<t> n {x,y) as 


U{x, y) = C T <p(x, y) = ^2 C n (p n (x, y) = Yl H c n4>n = c f<»l+ c 2 <P2 

«=1 n= 1 n=N b +1 

(9.4.4) 

where 

</>i=[<Pi fa ■ ci=[ci c 2 • c Nb ] r (9.4.5a) 

<P2 = [<l>N b +l <t>N b +2 ■ ] r > C2 = [Cw i+ 1 Cu b+ 2 ■ Cff n ] T 


(9.4.5b) 
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For each subregion s = l,..., N s , this solution can be written as 


<t>s(x, y) = ^C„0„, s (x, y) = Y^CniPnAD + Pn,s(2)x + p n , s (3)y) 

(9.4.6) 

4. Set the values of the boundary node coefficients in ci to the boundary 
values according to the boundary condition. 

5. Determine the values of the interior node coefficients in c 2 by solving the 
system of equations 

A 2 c 2 = d (9.4.7) 


A ' = fj { [i*'-] + [ly K '] [£"| " ‘ (X - 

<Pl,s = 02, s ' <!>N b ,sY 

^?M = [ PU(2) P2.s( 2) ■ PN b ,s(2)f 

TT<Pl,s = [#l,j(3) P2 ,j(3) ' PNb,s(3)f 

dy 


<P2,s = 10JW>+1,» 0V6+2.S ' 0Vn,s 1 

8 T 
~^V2.s =\.PNb+l,s(2) <t>Nb+2,s(2) ■ <t>Nn,s{2)] T 

=[PNb+l,s(3) 0Vi>+2,i(3) • <t>Nn,s(3)] T 
d = -A t ci - ^ f(x s , y s )<P2, s &S 

(x s , >’, ): the centroid (gravity center) of the .5 th subregion S s 


(9.4.9) 


The FEM is based on the variational principle that a solution to Eq. (9.4.1) 
an be obtained by minimizing the functional 


-//.(& 


—u(x, y )) + —u(x,y) 1 


- g(x, y)u 2 (x, y) + 2 f(x, y)u(x, y) dx dy (9.4.11) 
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which, with u(x, y ) = c T <p(x, y), can be written as 

ff \ T 3 3 T T 3 3 T 

,= lU c c+c vv’' 

— g(x, y)c T <p<p T c + 2f(x, y)c T ^£»J dxdy 
The condition for this functional to be minimized with respect to c if 


- g(x, y)<p 2 (p T c + f(x, y)<p 2 1 dx dy = 0 
a AlCi + A 2 c 2 + ^ f(x s , y s )<p 2 , s AS s = 0 


(9.4.12) 


(9.4.13) 

(9.4.14) 


See [R-l] for details. 

The objectives of the MATLAB routines “fem_basis_ftn()” and 
“fem_coef ()” are to construct the basis function <p n s (x, y)’s for each node 
n = 1, ..., N n and each subregion s = 1,..., N s and to get the coefficient vector 
c of the solution (9.4.4) via Eq. (9.4.7) and the solution polynomial <l) s (x,yys 
via Eq. (9.4.6) for each subregion 5 = 1,..., N s , respectively. 

Before going into a specific example of applying the FEM method to solve 
a PDE, let us take a look at the basis (shape) function <j) n (x, y) for each node 
n = 1,..., N n , which is defined collectively for all of the (triangular) subregions 
so that <j>„ is 1 only at node n, and 0 at all other nodes and can be generated by 
the routine “fem_basis_ftn()”. 


function p = fem_basis_ftn(N,S) 

%p(i,s,1:3): coefficients of each basis ftn phi_i 
% for s-th subregion(triangle) 

%N(n,1:2) : x & y coordinates of the n-th node 
%S(s,1:3) : the node #s of the s-th subregion(triangle) 

N_n = size(N,1); % the total number of nodes 
N_s = size(S,1); % the total number of subregions(triangles) 
for n = 1:N_n 
for s = 1:N_s 
for i = 1:3 

A(i,1:3) = [1 N(S(S,i),1:2)]; 

b(i) = (S(s,i) == n); %The nth basis ftn is 1 only at node n. 
end 

pnt=A\b 1 ; 

for i=1:3, p(n,s,i) = pnt(i); end 
end 
end 
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function [U,c] = femcoef(f,g,p,c,N,S,N_i) 

%p(i,s,1:3): coefficients of basis ftn phi_i for the s-th subregion 
%c =[.11 .00.] with value for boundary and 0 for interior nodes 
%N(n,1:2) : x & y coordinates of the n-th node 
%S(s,1:3) : the node #s of the s-th subregion(triangle) 

%N_i : the number of the interior nodes 

%U(s,1:3) : the coefficients of pi + p2(s)x + p3(s)y for each subregion 
N_n = size(N,1); % the total number of nodes = N_b + N_i 
N_s = size(S,1); % the total number of subregions(triangles) 
d=zeros(N_i,1); 



for n = 1:N_n 
for s = 1:N_s 

xy = (N(S(s,1),:) + N(S(s,2),:) + N(S(s,3),:))/3; %gravity center 
%phi_i,x*phi_n,x + phi_i,y*phi_n,y - g(x,y)*phi_i*phi_n 
p_vctr = [p([i n],s,1) p([i n],s,2) p([i n],s,3)]; 
tmpg(s) = sum(p(i,s,2:3).*p(n,s,2:3))... 

-g(xy(1),xy(2))*p_vctr(1,:)*[1 xy]'*p_vctr(2,:)*[1 xy] 1 ; 
dS(s) = det([N(S(s,1),:) 1; N(S(s,2),:) 1;N(S(s,3),:) 1])/2; 

%area of triangular subregion 

if n == 1, tmpf(s) = -f(xy(1),xy(2))*p_vctr(1,:)*[1 xy]'; end 

A12(i - N_b,n) = tmpg*abs(dS)'; %Eqs. (9.4.8),(9.4.9) 

d(i-N_b) = tmpf*abs(dS)'; %Eq.(9.4.10) 
end 

d = d - A12(1:N_i,1:N_b)*c(1:N_b) 1 ; %Eq.(9.4.10) 

c(N_b + 1:N_n) = A12(1:N_i,N_b+1:N_n)\d; %Eq.(9.4.7) 
for s = 1:N_s 

for j = 1:3, U(s,j) = c*p(:,s,j); end %Eq.(9.4.6) 


Actually, we will plot the basis (shape) functions for the region divided into four 
triangular subregions as depicted in Fig. 9.7 in two ways. First, we generate the 
basis functions by using the routine “fem_basis_ftn( )” and plot one of them 
for node 1 by using the MATLAB command mesh(), as depicted in Fig. 9.8a. 
Second, without generating the basis functions, we use the MATLAB command 
“trimesh ()” to plot the shape functions for nodes n = 2, 3, 4, and 5 as depicted 
in Figs. 9.8b-e, each of which is 1 only at the corresponding node n and is 
0 at all other nodes. Figure 9.8f is the graph of a linear combination of basis 
functions 

N„ 

u(x, y) = c T (p(x, y) = ^c„0„(x, y ) (9.4.15) 

having the given value c n at each node n. This can obtained by using the MAT¬ 
LAB command “trimesh()” as 

»trimesh(S,N(:,1),N(:,2),c) 

where the first input argument S has the node numbers for each subregion, the 
second/third input argument N has the x/y coordinates for each node, and the 
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coordinates of 

N = [-1 1; 

1 i; 

1 -1; 

-1 -1; 

0.2 0.5] 

node numbers 
of subregions 
S = [1 2 5; 

2 3 5; 

3 4 5; 

1 4 5] 


Figure 9.7 A region (domain) divided into four triangular subregions. 


fourth input argument c has the function values at each node as follows: 


'1 2 5" 

2 3 5 

3 4 5 
1 4 5 



~-l 1 " 


-0" 

N = 

1 -1 

c = 

2 


-1 -1 


3 


_ 0.2 0.5. 


_0_ 


For this job, we make the following program “show_basis .m” and run it to 
get Figs. 9.7 and 9.8 together with the coefficients of each basis function as 
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The meaning of this N n (the number of nodes:5) x N s (the number of subre¬ 
gions^) x 3 array p is that, say, the second rows of the three sub-arrays constitute 
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(e) 0 5 (x,y) (f) &>(*. y)+2H x ’ y)+30 4 (x, y) 


Figure 9.8 The basis (shape) functions for nodes in Fig. 9.7 and a composite function. 


the coefficient vectors of the basis function for node 2 as 


-7/10 + (1/2)* + (6/5)y for subregion Si 

-7/16 + (15/16)* + (1/2) y for subregion S 2 
0 + 0 • * + 0 • y for subregion S 3 

0 + 0 • x + 0 • y for subregion S 4 


(9.4.18) 


which turns out to be 1 only at node 2 [i.e., (1,1)] and 0 at all other nodes and on 
the subregions that do not have node 2 as their vertex, as depicted in Fig. 9.8b. 
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With the program “show_basis. m” in your computer, type the following com- 
lands into the MATLAB command window and see the graphical/textual output. 


Now, let us see the following example. 


%show_basis 

clear 

N = [-1 1;1 1;1 -1; -1 -1;0.2 0.5]; %the list of nodes in Fig.9.7 
N_n = size(N,1); % the number of nodes 

S = [1 2 5;2 3 5;3 4 5;1 4 5]; %the list of subregions in Fig.9.7 
N_s = size(S,1); % the number of subregions 
figure(l), elf 

nodes = [S(s,:) S(s,1)]; 
for i = 1:3 

plot([N(nodes(i),1) N(nodes(i + 1), 1) ], ... 

[N(nodes(i),2) N(nodes(i+1),2)]), hold on 


%basis/shape function 
p = fem_basis_ftn(N,S); 

xO = -1; xf = 1; yO = -1; yf = 1; %graphic region 
figure(2), elf 
Mx = 50; My = 50; 

dx = (xf - xO)/Mx; dy = (yf - yO)/My; 
xi = xO + [0:Mx]*dx; yi = yO + [0:My]*dy; 

i_ns = [1 2345]; %the list of node numbers whose basis ftn to plot 
for itr =1:5 
i_n = i_ns(itr); 


if inpolygon(xi(i),yi(j), N(S(s,:),1),N(S(s,:),2)) > 0 
Z(j,i) = p(i_n,s,1) + p(i_n,s,2)*xi(i) + p(i_n,s,3)*yi(j); 
break; 


subplot(321), mesh(xi,yi,Z) %basis function for i 
else 

cl = zeros(size(c)); c1(i_n) = 1; 
subplot(320 + itr) 

trimesh(S,N(:,1),N(:,2),c1) %basis function for i 


c = [01 230]; %the values for all nodes 
subplot(326) 

trimesh(S,N(:,1),N(:.2),c) %Fig.9.8f: a composite function 
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Example 9.6. Laplace’s Equation: Electric Potential Over a Plate with Point 
Charge. Consider the following Laplace’s equation: 


V 2 «(x, y) = 


d 2 u(x, y ) d 2 u(x, y ) 
dx 2 + 8y 2 

for - 1 < x < +1, - 


= f(x, y) 

1 < y < +1 


where 


-I for (x, y) = (0.5, 0.5) 

f{x, y) = +1 for (x, y) = (-0.5, -0.5) (E9.6.2) 

0 elsewhere 


and the boundary condition is u(x, y) = 0 for all boundaries of the rectangu¬ 
lar domain. 

In order to solve this equation by using the FEM, we locate 12 boundary 
points and 19 interior points, number them, and divide the domain into 36 tri¬ 
angular subregions as depicted in Fig. 9.9. Note that we have made the size of 
the subregions small and their density high around the points (+0.5, +0.5) and 



Figure 9.9 An example of triangular subregions for FEM. 
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(—0.5, —0.5), since they are only two points at which the value of the right-hand 
side of Eq. (9.6.1) is not zero, and consequently the value of the solution u(x, y) 
is expected to change sensitively around them. 

We made the following MATLAB program “do_fem.m” in order to use the 
routines “fem_basis_ftn()” and “fem_coef ()” for solving this equation. For 
comparison, we have added the statements to solve the same equation by using the 
routine “poisson()” (Section 9.1). The results obtained by running this program 
are depicted in Fig. 9.10a-c. 



(a) 31 -point FEM solution drawn by using trimesh () 



(b) 31-point FEM solution by using mesh () 



(c) 16x1 5-point FDM (Finite Difference Method) solution 
Figure 9.10 Results of Example 9.6. 
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%do_fem 

% for Example 9.6 
clear 

N = [-1 0;-1 -1;-1/2 -1;0 -1;1/2 -1; 1 -1;1 0;1 1;1/2 1; 0 1; 

-1/2 1;-1 1; -1/2 -1/4; -5/8 -7/16;-3/4 -5/8;-1/2 -5/8; 

-1/4 -5/8;-3/8 -7/16; 0 0; 1/2 1/4;5/8 7/16;3/4 5/8; 

1/2 5/8;1/4 5/8;3/8 7/16;-9/16 -17/32;-7/16 -17/32; 

-1/2 -7/16;9/16 17/32;7/16 17/32;1/2 7/16]; %nodes 
N_b = 12; %the number of boundary nodes 

S = [1 11 12;1 11 19;10 11 19;4 5 19;5 7 19; 5 6 7;1 2 15; 2 3 15; 

3 15 17;3 4 17;4 17 19;13 17 19;1 13 19; 1 13 15;7 8 22;8 9 22; 

9 22 24;9 10 24; 10 19 24; 19 20 24;7 19 20; 7 20 22;13 14 18; 

14 15 16;16 17 18;20 21 25;21 22 23;23 24 25;14 26 28; 

16 26 27;18 27 28; 21 29 31;23 29 30;25 30 31; 

26 27 28; 29 30 31]; %triangular subregions 
f962 = '(norm([x y]+[0.5 0.5])<0.01)-(norm([x y]-[0.5 0.5]) < 0.01)'; 
f=inline(f962,'x 1 ,'y 1 ); %(E9.6.2) 
g=inline('O'.'x'.'y'); 

N_n = size(N,1); %the total number of nodes 
N_i = N_n - N_b; %the number of interior nodes 

c = zeros(1,N_n); %boundary value or 0 for boundary/interior nodes 
p = fem_basis_ftn(N,S); 

[U,c] = fem_coef(f,g,p,c,N,S,N_i); 

%0utput through the triangular mesh-type graph 
figure(l), elf, trimesh(S,N(:,1),N(:,2),c) 

%0utput through the rectangular mesh-type graph 

N_s = size(S,1); %the total number of subregions(triangles) 

xO = -1; xf = 1; yO = -1; yf = 1; 

Mx = 16; dx = (xf - xO)/Mx; xi = x0+[0:Mx]*dx; 

My = 16; dy = (yf - yO)/My; yi = y0+[0:My]*dy; 

for i = 1:length(xi) 
for j = 1:length(yi) 

for s = 1:N_s %which subregion the point belongs to 
if inpolygon(xi(i),yi(j), N(S(s,:),1),N(S(s,:),2)) > 0 
Z(ijj) = U(s,:)*[1 xi(i) yi(j)] 1 ; %Eq.(9.4.5b) 
break; 
end 
end 
end 
end 

figure(2), elf, mesh(xi,yi,Z) 

%For comparison 

bxO = inline('O'); bxf = inline( 1 0 1 ); 
byO = inline( 1 0'); byf = inline('O'); 

[U,x,y] = poisson(f,g,bxO,bxf,byO,byf,[xO xf yO yf],Mx,My); 
figure(3), elf, mesh(x,y,U) 


9.5 GUI OF MATLAB FOR SOLVING PDES: PDETOOL 

In this section, we will see what problems can be solved by using the GUI (graphic 
user interface) tool of MATLAB for PDEs and then apply the tool to solve the 
elliptic/parabolic/hyperbolic equations dealt with in Examples 9.1/9.3/9.5 and 9.6. 
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9.5.1 Basic PDEs Solvable by PDETOOL 

Basically, the PDE toolbox can be used for the following kinds of PDE. 

1. Elliptic PDE 


—V • (cVm) + au = f over a domain £2 (9.5.1) 


with some boundary conditions like 


hu = r (Dirichlet condition) 
or n ■ cVu + qu = g (generalized Neumann condition) 


(9.5.2) 


on the boundary 9 £2, where n is the outward unit normal vector to the boundary. 

Note that, in case u is a scalar-valued function on a rectangular domain as 
depicted in Fig. 9.1, Eq. (9.5.1) becomes 


(d 2 u(x,y) 

v dx 2 


d 2 u(x, y)\ 

——J + au(x, y) = f{x, y) 


(9.5.3) 


and if the boundary condition for the left-side boundary segment is of Neumann 
type like Eq. (9.1.7), Eq. (9.5.2) can be written as 


(du(x,y). du(x,y)\ 

c {-^ 1 + ^ T 3 ) +qu(x ’ y) 

du(x,y) 


- +qu(x, y) = g(x, y) 


(9.5.4) 


since the outward unit normal vector to the left-side boundary is n = i, where i 
and j are the unit vectors along the x axis and y-axis, respectively. 

2. Parabolic PDE 

du 

-V • (cVm) + au + d— = f (9.5.5) 

at 

over a domain £2 and for a time range 0 < t < T 


with boundary conditions like Eq. (9.5.2) and, additionally, the initial condition 
u(t 0 ). 

3. Hyperbolic PDE 

a 2 u 

—V • (cVm) + au + d —- = / 
at 2 

over a domain £2 and for a time range 0 < t < T 


(9.5.6) 
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with boundary conditions like Eq. (9.5.2) and, additionally, the initial conditions 
u(t 0 )/u'(t 0 ). 

4. Eigenmode PDE 


-V • (cVu) +au = Xdu (9.5.7) 

over a domain 0 and for an unknown eigenvalue A. 


with some boundary conditions like Eq. (9.5.2). 


The PDE toolbox can also deal with a system of PDEs like 


—V • (ciiVm) — V • (C 12 VM 2 ) + auui + ai 2«2 = fi 

— V • (C21V1I1) — V • (C22V U2) + Cl2lU\ + d2lU2 = /2 


with Dirichlet boundary conditions like 


domain G 

(9.5.8) 




1M = M 

:J \_ u 2 \ 


(9.5.9) 


or generalized Neumann boundary conditions like 


n ■ (ciiVwfi + n ■ (ci 2 Vw 2 ) + quu\ + qnu 2 = gi 

(9.5.10) 

n ■ (C21 Vm) + n ■ (c 22 Vm 2 ) + q 2 \Ui + q 2 iu 2 = gi 
or mixed boundary conditions, where 



9.5.2 The Usage of PDETOOL 

The PDEtool in MATLAB solves PDEs by using the FEM (finite element method). 
We should take the following steps to use it. 


0. Type ‘pdetool ’ into the MATLAB command window to have the PDE 
toolbox window on the screen as depicted in Fig. 9.11. You can tog¬ 
gle on/off the grid by clicking ‘Grid’ in the Options pull-down menu 
(Fig. 9.12a). You can also adjust the ranges of the x axis and the y axis 
in the box window opened by clicking ‘Axes_Limits’ in the Options pull¬ 
down menu. If you want the rectangles to be aligned with the grid lines, 
click ‘Snap(-to-grid)’ in the Options pull-down menu (Fig. 9.12a). If you 
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□ | a|o|<£>| ^ |m|pde| A |A | - l^ l 1 ^ | [Generic ScalaT" 
Set formula: |[[R1-R2)*E1]-E2~ 
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Info: Select the type of PDE application from this pop-up menu. 


Figure 9.11 The GUI (graphical user interface) window of the MATLAB PDEtool. 


want to have the x axis and the y axis of equal scale so that a circle/square 
may not look like an ellipse/rectangle, click ‘Axes_Equal’ in the Options 
pull-down menu. You can choose the type of PDE problem you want to 
solve in the submenu popped out by clicking ‘Application’ in the Options 
pull-down menu (Fig. 9.12a). 

(cf) In order to be able to specify the boundary condition for a boundary segment 
by clicking it, the segment must be inside in the graphic region of PDEtool. 

1. In Draw mode, you can create the two-dimensional geometry of domain O 
by using the constructive solid geometry (CSG) paradigm, which enables 
us to make a set of solid objects such as rectangles, circles/ellipses, and 
polygons. In order to do so, click the object that you want to draw in 
the Draw pull-down menu (Fig. 9.12b) or click the button with the cor¬ 
responding icon (□,[£, Q, ) in the tool-bar just below the top 

menu-bar (Fig. 9.11). Then, you can click-and-drag to create/move the 
object of any size at any position as you like. Once an object is drawn, 
it can be selected by clicking on it. Note that the selected object becomes 
surrounded by a black solid line and can be deleted by pressing Delete or 
A R(Ctrl-R) key. The created object is automatically labeled, but it can be 
relabeled and resized (numerically) through the Object dialog box opened 
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by double-clicking the object and even rotated (numerically) through the 
box opened by clicking ‘Rotate’ in the Draw pull-down menu. After cre¬ 
ating and positioning the objects, you can make a CSG model by editing 
the set formula appropriately in the set formula field of the second line 
below the top menu-bar to take the union (by default), the intersection, 
and the set difference of the objects to form the shape of the domain 
(Fig. 9.11). If you want to see the overall shape of the domain you created, 
click ‘Boundary_mode’ in the Boundary pull-down menu. 

2. In Boundary mode, you can remove the subdomain borders that are 
induced by the intersections of the solid objects, but are not between 
different materials and also specify the boundary condition for each 
boundary segment. First, click the dQ button in the tool-bar (Fig. 9.11) 
or ‘Boundary_mode( A B)’ in the Boundary pull-down menu (Fig. 9.12c), 
which will make the boundary segments appear with red/blue/green colors 
(indicating Dirichlet(default)/Neumann/mixed type of boundary condition) 
and arrows toward its end (for the case where the boundary condition 
is parameterized along the boundary). When you want to remove all 
the subdomain borders, click ‘Remove_All_Subdomain_Borders’ in the 
Boundary pull-down menu. You can set the parameters h, r or g,q 
in Eq. (9.5.2) to a constant or a function of x and y specifying the 
boundary condition, through the box window opened by double-clicking 
each boundary segment. In case you want to specify/change the boundary 
condition for multiple segments at a time, you had better use shift-click 
the segments to select all of them (which will be colored black) and click 
again on one of them to get the boundary condition dialog box. 

3. In PDE mode, you can specify the type of PDE (Elliptic/Parabolic/Hyper- 
bolic/Eigenmode) and its parameters. In order to do so, open the PDE 
specification dialog box by clicking the PDE button in the tool-bar or 
‘PDE_Specification’ in the PDE pull-down menu (Fig. 9.12d), check the 
type of PDE, and set its parameters in Eq. (9.5.1)/(9.5.5)/(9.5.6)/(9.5.7). 

4. In Mesh mode, you can create the triangular mesh for the domain 
drawn in Draw mode by just clicking the A button in the tool-bar or 
‘Initialize_Mesh( A I)’ in the Mesh pull-down menu (Fig. 9.12e). To improve 
the accuracy of the solution, you can refine successively the mesh by 
clicking the & button in the tool-bar or ‘Refine_Mesh( A M)’ in the Mesh 
pull-down menu. You can jiggle the mesh by clicking ‘Jiggle_Mesh’ in 
expectation of better accuracy. You can also undo any refinement by 
clicking ‘Undo_Mesh_Change’ in the Mesh pull-down menu. 

5. In Solve mode, you can solve the PDE and plot the result by just clicking 
the = button in the tool-bar or ‘Solve_PDE( A E)’ in the Solve pull-down 
(Fig. 9.12f). But, in the case of parabolic or hyperbolic PDE, you must 
click ‘Parameters’ in the Solve pull-down menu (Fig. 9.12f) to set up the 
initial conditions and the time range before solving the PDE. 
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6. In Plot mode, you can change the plot option in the Plot selection dialog 
box opened by clicking the ^ button in the tool-bar or ‘Parameters’ in 
the Plot pull-down menu (Fig. 9.12g). In the Plot selection dialog box 
(Fig. 9.12h), you can set the plot type to, say, Color/Height(3-D) and set 
the plot style to, say, interpolated shading and continuous (interpolated) 
height. If you want the mesh to be shown in the solution graph, check 
the box of Show_mesh. In case you want to plot the graph of a known 
function, change the option(s) of the Property into ‘user_entry’, type in the 
MATLAB expression describing the function and click the Plot button. You 
can save the plot parameters as the current default by clicking the Done 
button. You can also change the color map in the second line from the 
bottom of the dialog box. 

(cf) We can extract the parameters involved in the domain geometry by clicking ‘Export..’ 
in the Draw pull-down menu, the parameters specifying the boundary by clicking 
‘Export..’ in the Boundary pull-down menu, the parameters specifying the PDE by 
clicking ‘Export..’ in the PDE pull-down menu, the parameters specifying the mesh 
by clicking ‘Export..’ in the Mesh pull-down menu, the parameters related to the 
solution by clicking ‘Export..’ in the Solve pull-down menu, and the parameters 
related to the graph by clicking ‘Export..’ in the Plot pull-down menu. Whenever 
you want to save what you have worked in PDEtool, you may select File/Save in 
the top menu-bar. 

(cf) Visit the website “http://www.mathworks.com/access/helpdesk/help/helpdesk. 
html” for more details. 


9.5.3 Examples of Using PDETOOL to Solve PDEs 

In this section, we will make use of PDEtool to solve some PDE problems that 
were dealt with in the previous sections. 


Example 9.7. Laplace’s Equation: Steady-State Temperature Distribution Over 
a Plate. Consider the Laplace’s equation (Example 9.1) 


V 2 m(x, y) = 


d 2 u{x, y) 


d 2 u(x,y ) 
dy 2 


= 0 


with the following boundary conditions. 


for 0 < jc < 4, 0 < y < 4 

(E9.7.1) 


u( 0, y) = e y — cosy, n(4, y) = e y cos4 — e 4 cosy 

u(x, 0) = cosv — e x , u(x, 4) = e^cosx — e x cos4 


(E9.7.2) 

(E9.7.3) 


The procedure for using PDEtool to solve this problem is as follows: 

0. Type ‘pdetool’ into the MATLAB command window to have the PDE 
toolbox window on the screen. Then, adjust the ranges of the v-axis and 
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the y-axis to [0 5] and [0 5], respectively, in the dialog box opened by 
clicking ‘Axes_Limits’ in the Options pull-down menu. You can also click 
‘Axes_Equal’ in the Options pull-down menu to have the x axis and 
the y axis of equal scale so that a circle/square may not look like an 
ellipse/rectangle. 

1. Click the □ button in the tool-bar and click-and-drag on the graphic 
region to create a rectangle of domain. Then, in the Object dialog box 
opened by double-clicking the rectangle, set the Left/Bottom/Width/Height 
to 0/0/4/4. In this case, you don’t have to construct a CSG model by 
editing the set formula, because the domain consists of a single object: 
a rectangle. 

2. Click the button in the tool-bar and double-click each boundary seg¬ 
ment to specify the boundary condition as Eqs. (E9.7.2,3) in the boundary 
condition dialog box (see Fig. 9.13a). 

3. Open the PDE specification dialog box by clicking the PDE button in the 
tool-bar, check the box on the left of Elliptic as the type of PDE, and set 
its parameters in Eq. (E9.7.1) as depicted in Fig. 9.13b. 

4. Click the A button in the tool-bar to divide the domain into a number of 
triangular subdomains to get the triangular mesh as depicted in Fig. 9.13c. 
You can click the ^ button in the tool-bar to refine the mesh successively 
for better accuracy. 

5. Click the = button in the tool-bar to plot the solution in the form of two- 
dimensional graph with the value of u(x, y ) shown in color. 

6. If you want to plot the solution in the form of a three-dimensional graph 
with the value of u(x,y ) shown in height as well as color, check the box 
before Height on the far-left side of the Plot selection dialog box opened 
by clicking the ^ button in the tool-bar. If you want the mesh shown in 
the solution plot as Fig. 9.13d, check the box before Show_mesh on the 
far-left and low side and click the Plot button at the bottom of the Plot 
selection dialog box (Fig. 9.12h). You can compare the result with that of 
Example 9.3 depicted in Fig. 9.4. 

7. If you have the true analytical solution 


u(x, y) = e y cos* — e x cos y (E9.7.4) 

and you want to plot the difference between the PDEtool (FEM) solution 
and the true analytical solution, change the entry ‘u’ into ‘user entry’ in the 
Color/Contour row and the Height row of the Property column and write 
‘u- (exp(y). *cos(x) -exp(x). *cos(y) )’ into the corresponding fields in 
the User_entry column of the Plot selection dialog box opened by clicking 
the ^ button in the tool-bar and click the Plot button at the bottom of the 
dialog box. 
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Example 9.8. A Parabolic PDE: Two-Dimensional Temperature Diffusion Over 
a Plate. Consider a two-dimensional parabolic PDE 

in _ 4 (8 2 u{x,y,t) d 2 u(x,y,t)\ 8u(x,y,t) 

\ dx 2 + dy 2 )- a t 

for 0 < x < 4, 0 < y < 4 & 0 < t < 5000 (E9.8.1) 

with the initial conditions and boundary conditions 

u(x, y, 0) = 0 for r = 0 (E9.8.2a) 

u(x, y, t) = e y cosx - e x cosy for x = 0, x = 4, y = 0, y = 4 (E9.8.2b) 

The procedure for using the PDEtool to solve this problem is as follows. 

0-2. Do exactly the same things as steps 0-2 for the case of an elliptic PDE 
in Example 9.7. 

3. Open the PDE specification dialog box by clicking the PDE button, 
check the box on the left of ‘Parabolic’ as the type of PDE and set its 
parameters in Eq. (E9.8.1) as depicted in Fig. 9.14a. 

4. Exactly as in step 4 (for the case of elliptic PDE) in Example 9.7, click 
the A button to get the triangular mesh. You can click the ^ button to 
refine the mesh successively for better accuracy. 

5. Unlike the case of an elliptic PDE, you must click ‘Parameters’ in 
the Solve pull-down menu (Fig. 9.12f) to set the time range, say, as 
0:100:5000 and the initial conditions as Eq. (E9.8.2a) before clicking 
the = button to solve the PDE. (See Fig. 9.14b.) 

6. As in step 6 of Example 9.7, you can check the box before Height in 
the Plot selection dialog box opened by clicking the ^ button, check 
the box before Show_mesh, and click the Plot button. If you want to 
plot the solution graph at a time other than the final time, select the time 
for plot from 

{0, 100, 200,..., 500} 

in the far-right field of the Plot selection dialog box and click the Plot 
button again. If you want to see a movie-like dynamic picture of the 
solution graph, check the box before Animation, click Options right after 
Animation, fill in the fields of animation rate in fps (i.e., the number of 
frames per second and the number of repeats in the Animation Options 
dialog box), click the OK button, and then click the Plot button in the 
Plot selection dialog box. 

(cf) If the dynamic picture is too oblong, you can scale up/down the solution by chang¬ 
ing the Property of the Height row from ‘u’ into ‘user entry’ and filling in the 
corresponding field of User_entry with, say, ‘u/25’ in the Plot selection dialog box. 
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Procedure and results of using PDEtool for Example 9.3/9. 
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According to your selection, you will see a movie-like dynamic picture or 
the (final) solution graph like Fig. 9.14d, which is the steady-state solution 
for Eq. (E9.8.1) with du(x, y, t)/dt = 0, virtually the same as the elliptic PDE 
(E9.7.1) whose solution is depicted in Fig. 9.13d. 

Before closing this example, let us have an experience of exporting the values 
of some parameters. For example, we extract the mesh data {p, e, t} by clicking 
‘Export_Mesh’ in the Mesh pull-down menu and then clicking the OK button 
in the Export dialog box. Among the mesh data, the matrix p contains the x 
and y coordinates in the first and second rows, respectively. We also extract 
the solution u by clicking ‘Export_Solution’ in the Solve pull-down menu and 
then clicking the OK button in the Export dialog box. Now, we can estimate 
how far the graphical/numerical solution deviates from the true steady-state solu¬ 
tion u(x, y) = e y cos* — e x cos y by typing the following statements into the 
MATLAB command window. 

»x = p (1 . :) 1 ; y = p(2. : )'; %x.y coordinates of nodes in column vector 
»err = exp(y).*cos(x) - exp(x).*cos(y) - u(:.end); %deviation from true sol 
»err_max = max(abs(err)) %maximum absolute error 

Note that the dimension of the solution matrix u is 177 x 51 and the solution 
at the final stage is stored in its last column u (:, end), where 177 is the number 
of nodes in the triangular mesh and 51 = 5000/100 + 1 is the number of frames 
or time stages. 

Example 9.9. A Hyperbolic PDE: Two-Dimensional Wave (Vibration) Over a 
Square Membrane. Consider a two-dimensional hyperbolic PDE 

1 (d 2 u(x, y, t) d 2 u(x, y,t)\_ du 2 (x, y, t) 

4 \ dx 2 dy 2 ) 3 1 2 

for 0 < x < 2, 0 < y < 2, and 0 < t < 2 (E9.9.1) 

with the zero boundary conditions and the initial conditions 

u(0, y, t) = 0, u(2, y, t) = 0, m(jc, 0, t) = 0, u(x,2,t) = 0 (E9.9.2) 

u(x, y, 0) = 0.1 sin(^-v) sin(7ry/2), du/dt{x, y, 0) = 0 for t = 0 (E9.9.3) 

The procedure for using the PDEtool to solve this problem is as follows: 

0-2. Do the same things as steps 0-2 for the case of elliptic PDE in Example 9.7, 

except for the following. 

• Set the ranges of the x axis-and the y-axis to [0 3] and [0 3]. 

• Set the Left/Bottom/Width/Height to 0/0/2/2 in the Object dialog box 
opened by double-clicking the rectangle. 

• Set the boundary condition to zero as specified by Eqs. (E9.9.2) in 
the boundary condition dialog box opened by clicking the 30 button 
in the tool-bar, shift-clicking the four boundary segments and double¬ 
clicking one of the boundary segments. 
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(dl) The mesh plot of soution f= 0.1 (d2) The mesh plot of solution at t= 1.7 

Figure 9.15 Procedure and results of using PDEtool for Example 9.5/9.9. 


3. Open the PDE specification dialog box by clicking the PDE button, 
check the box on the left of ‘Hyperbolic’ as the type of PDE, and set 
its parameters in Eq. (E9.9.1) as depicted in Fig. 9.15a. 

4. Do the same thing as step 4 for the case of elliptic PDE in Example 9.8. 

5. Similarly to the case of a parabolic PDE, you must click ‘Parameters’ 
in the Solve pull-down menu (Fig. 9.12f) to set the time range, say, as 
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0:0.1:2 and the initial conditions as Eq. (E9.9.3) before clicking the = 
button to solve the PDE. (See Fig. 9.15b.) 

6. Do almost the same thing as step 6 for the case of parabolic PDE in 
Example 9.8. 

Finally, you could see the solution graphs like Figs. 9.15(dl)&(d2), that are 
similar to Figs. 9.6(a)&(c). 


Example 9.10. Laplace’s Equation: Electric Potential Over a Plate with Point 
Charge. Consider the Laplace’s equation (dealt with in Example 9.6) 


V 2 u(x, y ) = 


d 2 u(x, y ) 
dx 2 


d 2 u(x,y ) 
df- 


= fix, y) 


for -1 < x < +1,-1 < y < +1 


where 


fix, y) 


-1 for (x, y) = (0.5, 0.5) 

+ 1 for (x, y) = (-0.5, -0.5) 
0 elsewhere 


(E9.10.1) 


(E9.10.2) 


and the boundary condition is uix,y) = 0 for all boundaries of the rectangu¬ 
lar domain. 

The procedure for using the PDEtool to solve this problem is as follows. 


0-2. Do the same thing as step 0-2 for the case of elliptic PDE in Example 9.7, 
except for the following. 

• Set the Left/Bottom/Width/Height to —1/—1/2/2 in the Object dialog 
box opened by double-clicking the rectangle. 

• Set the boundary condition to zero in the boundary condition dialog box 
opened by clicking the 90 button in the tool-bar, shift-clicking the four 
boundary segments, and double-clicking one of the boundary segments. 

3. Open the PDE specification dialog box by clicking the PDE button, check 
the box on the left of ‘Elliptic’ as the type of PDE, and set its parameters 
in Eq. (E9.10.1,2) as depicted in Fig. 9.16a. 

4. Click the A button to initialize the triangular mesh. 

5. Click the ^ button to open the Plot selection dialog box, check the box 
before ‘Height’, and check the box before ‘Show_mesh’ in the dialog box. 

6. Click the Plot button to get the solution graph as depicted in Fig. 9.16c. 

7. Click ‘Parameters’ in the Solve pull-down menu to open the ‘Solve Param¬ 
eters’ dialog box depicted in Fig. 9.16b, check the box on the left of 
‘Adaptive mode’, and click the OK button in order to activate the adaptive 
mesh mode. 

8. Click the = button to get a solution graph with the adaptive mesh. 
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9. Noting that the solution is not the right one for the point charge distribution 
given by (E9.10.2), reopen the PDE specification dialog box by clicking 
the PDE button and rewrite f as below. 

f |(-(((X t O.S),~2 •» |y * 0.5),-2 < 0.00061) + (((X - 0.5),-2 -t (y - 0,5),-2) < C.00064))| 

10. Noting that the mesh has already been refined in the adaptive way to 
yield smaller meshes in the region where the slope of the solution is 
steeper, click ‘Parameters’ in the Solve pull-down menu to open the ‘Solve 
Parameters’ dialog box, uncheck the box on the left of ‘Adaptive mode’, 
and click the OK button in the dialog box in order to inactivate the 
adaptive mesh mode. 

11. Click the = button to get the solution graph as depicted in Fig. 9.16d. 

12. You can click ‘Refine_Mesh( A M)’ in the Mesh pull-down menu and click 
the = button to get a more refined solution graph (with higher resolution) 
as many times as you want. 


PROBLEMS 


9.1 Elliptic PDEs: Poisson Equations 

Use the routine “poisson()” (in Section 9.1) to solve the following PDEs 
and plot the solutions by using the MATLAB command “mesh()”. 


d 2 u(x,y) , d 2 u(x,y) 
dx 2 3 y 2 

for 0 < x < 1, 0 < y < 1 


with the boundary conditions 

«(0,y) = y 2 , u( l,;y) = l, 

u(x,0)=x 2 , u(x, 1) = 1 


(P9.1.1) 


(P9.1.2) 


Divide the solution region (domain) into M x x M y = 5 x 10 sections. 


3 2 u(x,y) 3 2 u(x,y) 

w + l2 - 5n 

2 u(x, y) 

= — 25k 2 cos ( ^" A J cos ( 

^y^J for 0 < x, y < 0.4 (P9.1.3) 

with the boundary conditions 


m(0, y) = cos ( y 3 ') ’ 

u(0.4, y) = — cos (P9.1.4) 

u(x, 0) = cos - 

u(x, 0.4) = - cos (P9.1.5) 
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Divide the solution region into M x x M y = 40 x 40 sections. 


3x 2 3 y 2 

= 4n cos(7r(x 2 + y 2 )) 


+ 4n{x 2 + y 2 )u{x,y) 

for 0 < x < 1, 0 < y < 1 (P9.1.6) 


with the boundary conditions 

«(0, y) = sin(7ry 2 ), u( 1, y) = sin(nr(y 2 + 1)) (P9.1.7) 

k(x, 0) = sin(7rx 2 ), u(x, 1) = sin(7r(x 2 + 1)) (P9.1.8) 


Divide the solution region into M x x M y = 40 x 40 sections. 

(d) + = teOixil.O SJ<2 (P9.1.9) 

3x z 3 y L 

with the boundary conditions 

u( 0, y) = 2e y , u( 1, y) = 2e 2x+y , 

u(x, 0) = 2e 2x , u(x, 2) = 2e lx+1 (P9.1.10) 


Divide the solution region into M x x M y = 20 x 40 sections. 


d 2 u(x,y) i 3 2 u(x,y) 
3x 2 + dy 2 


for 0 < x < 1,0 < y < n/2 (P9.1.11) 


with the boundary conditions 

u{ 0, y) = 4cos(3y), u( 1, y) = 4e -3 cos(3y), 

m(x, 0) = 4e~ 3x , u(x, n/2) = 0 (P9.1.12) 


Divide the solution region into M x x M y = 20 x 20 sections. 
9.2 More General PDE Having Nonunity Coefficients 

Consider the following PDE having nonunity coefficients. 


. 9 2 m(x, y) 
A 


g 3 2 u(x,y) 
3x3 y 


+ C 


3 2 u(x, y) 


+ g(x, y)u(x, y) = f(x,y) 

(P9.2.1) 


Modify the routine “poisson()” so that it can solve this kind of PDEs and 
declare it as 


function [u,x,y] = poisson_abc(ABC,f,g,bx0,bxf,by0,...,Mx,My,tol,imax) 
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where the first input argument ABC is supposed to carry the vector containing 
three coefficients A, B, and C. Use the routine to solve the following PDEs 
and plot the solutions by using the MATLAB command “mesh()”. 


(a) 


d 2 u(x,y) J 2 u(x,y ) 

dx 2 dy 2 


for 0<jc<l,0<y<l 


(P9.2.2) 


with the boundary conditions 


«(0, y) = y 2 , «(l,y) = (y + 2) 2 , 

u(x, 0) = Ax 2 , u(l, y) = (2x + l) 2 


Divide the solution region (domain) into M x x M y = 20 x 40 sections. 


(b) 


d 2 u(x, y) „ d 2 u(x,y ) „ d 2 u(x,y) 

dx 2 dx dy dy 2 


with the boundary conditions 


for 0 < ^ < 1 , 0 < y < 1 
(P9.2.4) 


m(0, y) = e y + cosy, u(l,y) = e y 1 + cos(y - 2) (P9.2.5) 

u(x, 0) = e~ x + cos(— 2x), u(x, 1) = e l ~ x + cos(l — 2 jc) (P9.2.6) 


Divide the solution region into M x x M y = 40 x 40 sections. 


d 2 u(x,y ) o d 2 u(x,y ) n d 2 u(x,y) 

dx 2 dx dy dy 2 

for 0 < x < 2, 0 < y < n 


siny 


(P9.2.7) 


with the boundary conditions 

«(0, y) = (3/4)cosy, u( 2, y) = - sin(y) + (3/4)cosy (P9.2.8) 
u(x, 0) = 3/4, u(x, n) = -3/4 (P9.2.9) 


Divide the solution region into M x x M y = 20 x 40 sections. 
2 u(x, y) 




- = 0 for0<A:<l,0<y<l 


(P9.2.10) 

with the boundary conditions 

u(0,y) = ye 2y , u( 1, y) = (1 + y)e 1+2y , 

(P9.2.11) 

u(x, 0) = xe x , u(x, 1) = (x + l)e x+2 
Divide the solution region into M x x M y = 40 x 40 sections. 





iction [u,x,y] = poisson_Neuman(f,g,bxO,bxf,byO,byf,xO,xf,y0 



Neum(1) = x0(2) 
Neum(2) = xf(2) 
Neum(3) = y0(2) 
Neum(4) = yf(2) 


dx*dx; dy_2 = dy*dy; dxy2=2*(dx_2 + dy_ 
<_2/dxy2; ry = dy_2/dxy2; rxy = rx*dy_2; 
jx*2; dy2 = dy*2; rydx = ry*dx2; rxdy = r 
1,1:Mxl) = zeros(My1,Mx1); 


Neum(1) == 0 %Dirichlet boundary condition 
m = 1:My1, u(m,1) = bxO(y(m)); end %side a 
se %Neumann boundary condition 

or m = 1 :My 1 , duxa(m) = bxO(y(m)); end %du/dx(xO,y 
Neum(2) == 0 %Dirichlet boundary condition 


Neum(3) == 0 %Dirichlet boundary condition 

f Neum(1) == 0, u(1,1) = (u(1,1) + by0(x(1)))/2; nl = 2 
f Neum(2) == 0, u(1,Mxl)=(u(1,Mxl) + by0(x(Mx1)))/2; i 
or n = nl:nMI, u(1 ,n) = by0(x(n)); end %side c 
se %Neumann boundary condition 

or n = 1:Mxl, duyc(n) = by0(x(n)); end %du/dy(x,y0) 


F Neum(1) %Neumann boundary condition 


F Neum(2) %Neumann boundary condition 


1eum(3) %Neumann boundary condition 
ir j = 2:Mx 

u (1, j) = 2*rx*u(2,j)+ry*(u(1,j+1) + u(1,j-1)) ... 

+rxy*(G(1,i)*u(l,i) - F(l,i)) - rxdy*duyc(i) 
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9.3 Elliptic PDEs with Neumann Boundary Condition 
Consider the PDE (E9.1.1) (dealt with in Example 9.1) 


d 2 u(x, y) 3 2 u(x, y ) 
8x 2 + dy 2 


for 0 < jc < 4,0 < y < 4 


(P9.3.1) 


with different boundary conditions of Neumann type, which was discussed 
in Section 9.1. Modify the routine “poisson()” so that it can deal with the 
Neumann boundary condition and declare it as 

function [u,x,y] = poisson_Neuman(f,g,bxO,bxf,byO,byf,xO,xf,yO,yf,...) 


where the third/fourth/fifth/sixth input arguments are supposed to carry the 
functions of 

u{x o, y)/u(x f , y)/u(x, y 0 )/u(x, y f ) 


or 


3 u(x, y)/dx\ x=Xo /du(x, y)/dx\ x=Xf /du(x, y)/dy\ y = y Jdu(x, y)/3y(y= y ^ 

and the seventh/eighth/ninth/tenth input arguments are to carry xo/xy/yo/yy 
or [xq I \/[X f l]/[yo ]/[y/ 1] depending on whether each boundary condition 
is of Dirichlet or Neumann type. Use it to solve the PDE with the 
following boundary conditions and plot the solutions by using the MATLAB 
command “mesh()”. Divide the solution region (domain) into M x x M y = 
20 x 20 sections. 

(cf) You may refer to the related part of the program in the previous page. 

(a) du(x, y)/3jc|x=0 = — cosy, u(4, y) = e y cos4 — e 4 cosy (P9.3.2) 
du(x, y)/3y| y =o = cos*, u(.r, 4) = e 4 cos.r — e* cos4 (P9.3.3) 

(b) m( 0, y) = e y — cosy, du{x, y)/3x| A=4 = —e y sin4 - e 4 cosy (P9.3.4) 
u(x, 0) = cos* — e x , du(x, y)/3y| y=4 = e 4 cosx + e x sin4 (P9.3.5) 

(c) du(x, y)/3 jc] x== o = — cosy, u(4, y) = e y cos4 — e 4 cosy (P9.3.6) 

u(x, 0) = cosx - e x , 3 u(x, y)/3y y , = e 4 cosx + e x sin4 (P9.3.7) 

(d) m( 0, y) = e y - cosy, du(x, y)/dx\ x=A = —e y sin4 - e 4 cosy (P9.3.8) 

du{x, y)/3y| y=0 = cosx, u(x, 4) = e 4 cos a: - cos 4 (P9.3.9) 
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(e) 

du(x, y)/dx ^ 

_ 0 = — cos y, du(x,y)/dx\ x 

_ 4 = — e y sin 4 — e 4 cos y 




(P9.3.10) 


3 u(x, y)/3y } 

i=0 = cosx, u(x, 4) = e 4 c< 

ds x — e x cos 4 (P9.3.11) 

(f) 

3 u(x, y)/dx\ x 

_ 0 = — cosy, u(4, y) = e y 

cos 4 - e 4 cosy(P9.3.12) 


du(x,y)/dy\ y 

_ 0 = cos*, du(x,y)/ 3y v _ 4 

= e 4 cos x + e x sin 4 




(P9.3.13) 

(g) 

u( 0, y) = e y - 

- cosy, 3 u(x, y)/3x| 4 = —e- 1 

' sin4 — e 4 cosy(P9.3.14) 


du(x,y)/dy\ y 

j _ 0 = cosx, du(x, y)/3y y _ 4 

= e 4 cos x + e x sin 4 




(P9.3.15) 

(h) 

3 u(x, y)/dx , 

= — cosy, du(x,y)/dx\ x 

- ^ = — e y sin 4 — e 4 cos y 




(P9.3.16) 


3 u(x, y)/dy , 

t ,0 = cos.v. du(x. v)/3y v 4 

= e 4 cos x + e x sin 4 




(P9.3.17) 


9.4 Parabolic PDEs: Heat Equations 

Modify the program “solve_heat.m” (in Section 9.2.3) so that it can solve 
the following PDEs by using the explicit forward Euler method, the implicit 
backward Euler method, and the Crank-Nicholson method. 


3 2 u(x, t) 3 u(x, t) 

dx 2 3 1 


for 0 < x < 1,0 < t < 0.1 


(P9.4.1) 


with the initial/boundary conditions 

u(x,0) = x 4 , u(0, t) = 0, n(l, 0 = 1 (P9.4.2) 

(i) With the solution region divided into M x IV = 10 x 20 sections, 
does the explicit forward Euler method converge? What is the value 
of r = AAt/(Ax ) 2 ? 

(ii) If you increase M and N to make M x IV = 20 x 40 for better 
accuracy, does the explicit forward Euler method still converge? 
What is the value of r = A At/(Ax) 2 ? 

(iii) What is the number N of subintervals along the t axis that we should 
choose in order to keep the same value of r for M = 20? With that 
value of r, does the explicit forward Euler method converge? 
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3 2 u(x, t) du(x, t) 


for 0 < jc < 1, 0 <t < 6000 (P9.4.3) 


with the initial/boundary conditions 


u(x,0) = 2x + sm(2nx), u(0,t)=0, u(l,t) = 2 (P9.4.4) 


(i) With the solution region divided into M x N = 20 x 40 sections, 
does the explicit forward Euler method converge? What is the value 
of r = AAt/(Ax) 2 ! Does the numerical stability condition (9.2.6) 
seem to be so demanding? 

(ii) If you increase M and N to make M x N = 40 x 160 for better 
accuracy, does the explicit forward Euler method still converge? 
What is the value of r = A At/(Ax) 2 ! Does the numerical stability 
condition (9.2.6) seem to be so demanding? 

(iii) With the solution region divided into M x N = 40 x 200 sections, 
does the explicit forward Euler method converge? What is the value 
of r = AAt/(Ax) 2 ! 


d 2 u(x,t) du(x,t) 

,c)2 ^^ = -s- 


for 0 < a: < it, 0 < t < 0.2 


(P9.4.5) 


with the initial/boundary conditions 


H(*,0) = sin(2x), w(0, t) = 0, u(n,t) = 0 (P9.4.6) 


(i) By substituting 

u(x, t) = sin(2a;)e“ 8 ' 


(P9.4.7) 


into the above equation (P9.4.5), verify that this is a solution to 
the PDE. 

(ii) With the solution region divided into M x N = 40 x 100 sections, 
does the explicit forward Euler method converge? What is the value 
of r = A At/(Ax) 2 ! 

(iii) If you increase N (the number of subintervals along the t-axis) to 
125 for improving the numerical stability, does the explicit forward 
Euler method converge? What is the value of r = A At / (Ax) 2 ! Use 
the MATLAB statements in the following box to find the maximum 
absolute errors of the numerical solutions obtained by the three 
methods. Which method yields the smallest error? 


uo = inline) 1 sin(2*x)*exp(-8*t) 1 ,’x 1 ,'t 1 ); %true analytical solution 
Uo = uo(x,t); 

err = max(max(abs(u1 - Uo))) 
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(iv) If you increase N to 200, what is the value of r = A At/(Ax) 2 ! Find 
the maximum absolute errors of the numerical solutions obtained 
by the three methods as in (iii). Which method yields the small¬ 
est error? 


(d) - 


3 u(x, t ) 

' Ft 


for 0 < jc < 1,0 < t < 0.1 


with the initial/boundary conditions 


(P9.4.8) 


u(x, 0) = sin(7rx) + sin(3^x), n(0, t) = 0, u( 1, t) = 0 (P9.4.9) 


(i) By substituting 

u(x, t ) = sin (nx)e~ K ' t + sin(37TJc)e~ (37r)2 ' (P9.4.10) 

into Eq. (P9.4.5), verify that this is a solution to the PDE. 

(ii) With the solution region divided into M x TV = 25 x 80 sections, 
does the explicit forward Euler method converge? What is the value 
of r = A At / {Ax) 2 ! 

(iii) If you increase TV (the number of subintervals along the l axis) to 
100 for improving the numerical stability, does the explicit forward 
Euler method converge? What is the value of r = A At / (Ax) 2 ! Find 
the maximum absolute errors of the numerical solutions obtained by 
the three methods as in (c)(iii). 

(iv) If you increase TV to 200, what is the value of r = A At/(Ax) 2 ! Find 
the maximum absolute errors of the numerical solutions obtained by 
the three methods as in (c)(iii). Which one gained the accuracy the 
most of the three methods through increasing N! 

9.5 Parabolic PDEs with Neumann Boundary Conditions 

Let us modify the routines “heat_exp()”, “heat_imp()”, and “heat_cn()” 
(in Section 9.2) so that they can accommodate the heat equation (9.2.1) with 
Neumann boundary conditions 

du(x, t)/dx\ = b X0 (t), du(x, t)/dx\ x=x = b Xf (t) (P9.5.1) 


(a) Consider the explicit forward Euler algorithm described by Eq. (9.2.3) 
u) +l = r(u k i+x + iif.j) + (1 - 2r)u\ 

At 

for i = 1, 2,..., M - 1 with r = A -- (P9.5.2) 
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In the case of Dirichlet boundary condition, we don’t need to get Wq + 1 
and u k ^ x , because they are already given. But, in the case of the Neu¬ 
mann boundary condition, we must get them by using this equation for 
i = 0 and M as 

u k+1 = r(u\ + wij) + (1 - 2 r)u k (P9.5.3a) 

= r ( u M+i + «m-i) + (1 - 2 r)u k M (P9.5.3b) 

and the boundary conditions approximated as 

U1 2a x ~' = b ' oik) ' U ~ x = “* “ 2 W A * (P9.5.4a) 

= hMikh = + 2b ' M(k)Ax (P9 - 5 ' 4b) 
Substituting Eqs. (P9.5.4a,b) into Eq. (P9.5.3) yields 

u k+l = 2 r(u\ - b' 0 (k)Ax) + (1 - 2 r)u k (P9.5.5a) 

u k + l = 2r (u k M _ 1 + b' M (k)Ax) + (1 - 2 r)u k M (P9.5.5b) 

Modify the routine “heat_exp( )” so that it can use this scheme to deal 
with the Neumann boundary conditions for solving the heat equation and 
declare it as 

function [u,x,t] = heat_exp_Neuman(a,xfn,T,itO,bxO,bxf,M,N) 

where the second input argument xf n and the fifth and sixth input argu¬ 
ments bxO, bxf are supposed to carry [xf 0 1 ] and b xo (t), b' x (?), respec¬ 
tively, if the boundary condition at x 0 /xf is of Dirichlet/Neumann type and 
they are also supposed to carry [xf 1 1 ] and b' xo (t), b' x (t), respectively, 
if both of the boundary conditions at xq/x f are of Neumann type. 

(b) Consider the implicit backward Euler algorithm described by Eq. (9.2.13), 
which deals with the Neumann boundary condition at the one end for 
solving the heat equation (9.2.1). With reference to Eq. (9.2.13), modify 
the routine “heat imp () ” so that it can solve the heat equation with the 
Neumann boundary conditions at two end points xq and x / and declare 
it as 

function [u,x,t] = heat_imp_Neuman(a,xfn,T,itO,bxO,bxf,M,N) 

(c) Consider the Crank-Nicholson algorithm described by Eq. (9.2.17), which 
deals with the Neumann boundary condition at the one end for solving the 
heat equation (9.2.1). With reference to Eq. (9.2.17), modify the routine 
“heat_cn()” so that it can solve the heat equation with the Neumann 
boundary conditions at two end points %o and x / and declare it as 

function [u,x,t] = heat_cn_Neuman(a,xfn,T,itO,bxO,bxf,M,N) 
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(d) Solve the following heat equation with three different boundary condi¬ 
tions by using the three modified routines in (a), (b), (c) with M = 20, N 
= 100 and find the maximum absolute errors of the three solutions as in 
Problem 9.4(c)(iii). 


d 2 u(x,t ) du(x,t) 
dx 2 ~ dt 


for 0 < x < 1,0 < t < 0.1 


(P9.5.6) 


with the initial/boundary conditions 


(i) u(x,0) = sin(nx), du(x,t)/dx\ x=Q = Jt e 1 


u(x,t)\ x=l = 


(ii) u(x, 0) = sin(7r;c), u(x,t)\ x=0 = 0, 3«(.v. t)/i)x x 2V 

(iii) u(x, 0) = sin(7T.r), du(x, t)/dx\ x=0 = n e~ n2 ‘, 

3 u(x, t)/'dx ^j'= —n e~ n2 ‘ 


(P9.5.7^ 
—n e~ n ‘ 
(P9.5.8) 

(P9.5.9) 


Note that the true analytical solution is 


u(x, t ) = sin(7tx)e- xit (P9.5.10) 


9.6 Hyperbolic PDEs: Wave Equations 

Modify the program “solve_wave. m” (in Section 9.3) so that it can solve 
the following PDEs by using the explicit forward Euler method, the implicit 
backward Euler method, and the Crank-Nicholson method. 


(a) 4 


3 2 u(x, t ) 
dx 2 


d 2 u(x, t) 
dt 2 


for 0 < x < 1, 0 < t < 1 


with the initial/boundary conditions 


(P9.6.1) 


u(x, 0) = 0, 3 u(x, t)/dt\ f _ Q = 5 sin(7T.r), 

u( 0, t) = 0, m(1, t) = 0 (P9.6.2) 


Note that the true analytical solution is 
2.5 

u(x, t) = — sin(^-x) sin(2;n) (P9.6.3) 

n 

(i) With the solution region divided into M x iV = 20 x 50 sections, 
what is the value of r = A(Af) 2 /(Ax) 2 ? Use the MATLAB state¬ 
ments in Problem 9.4(c)(iii) to find the maximum absolute error of 
the solution obtained by using the routine “wave () ”. 
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(ii) With the solution region divided into M x JV = 40 x 100 sections, 
what is the value of r? Find the maximum absolute error of the 
numerical solution. 

(iii) If we increase M (the number of subintervals along the x axis) to 
50 for better accuracy, what is the value of r? Find the maximum 
absolute error of the numerical solution and determine whether it 
has been improved. 

(iv) If we increase the number M to 52, what is the value of r? Can 
we expect better accuracy in the light of the numerical stability 
condition (9.3.7)? Find the maximum absolute error of the numerical 
solution and determine whether it has been improved or not. 

(v) What do you think the best value of r is? 

(b) 6.25 d ^^ for 0 < x < n, 0 < t < 0.47T (P9.6.4) 

with the initial/boundary conditions 

u(x, 0) = sin(2x), 3 u(x, t)/dt\ t=0 = 0, 

w(0, t) = 0, w(l, t) = 0 (P9.6.5) 

Note that the true analytical solution is 


u(x, t) = sin(2x) cos(5t) (P9.6.6) 

(i) With the solution region divided into M x N = 50 x 50 sections, 
what is the value of r = 4(At) 2 /(Ax) 2 ? Find the maximum absolute 
error of the solution obtained by using the routine “wave()”. 

(ii) With the solution region divided into M x N = 50 x 49 sections, 
what is the value of r? Find the maximum absolute error of the 
numerical solution. 

(iii) If we increase N (the number of subintervals along the t axis) to 
51 for better accuracy, what is the value of r? Find the maximum 
absolute error of the numerical solution. 

(iv) What do you think the best value of r is? 


3 2 u(x, t ) 3 2 u(x, t) 

dx 2 ~ 3 1 2 


for 0 < x < 10, 0 < t < 10 


(P9.6.7) 


with the initial/boundary conditions 


u(x, 0) 


(jc — 2)(3 — jc) for 2 < x < 3 
0 elsewhere 


(P9.6.8) 


du(x, t)/dt\, =0 = 0, n(0, t) = 0, 


K(10,f) = 0 


(P9.6.9) 
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(i) With the solution region divided into M x N = 100 x 100 sections, 
what is the value of r = A(At) 2 /(Ax) 2 ? 

(ii) Noting that the initial condition (P9.6.8) can be implemented by the 
MATLAB statement as 

»it0 = inline('(x-2).*(3-x).*(2<x&x<3)','x'); 

solve the PDE (P9.6.7) in the same way as in “solve_wave. m” and 
make a dynamic picture out of the numerical solution, with the current 
time printed on screen. Estimate the time when one of the two separated 
pulses propagating leftwards is reflected and reversed. How about the 
time when the two separated pulses are reunited? 

9.7 FEM (Finite Element Method) 

In expectation of better accuracy/resolution, modify the program 
“do_fem.m” (in Section 9.6) by appending the following lines 

;-17/32 -31/64; -1/2 -17/32;-15/32 -31/64 
;17/32 31/64; 1/2 17/32; 15/32 31/64 

to the last part of the Node array N and replacing the last line of the subregion 
array S with 

26 32 33; 27 33 34; 28 32 34; 29 35 36; 

30 36 37; 31 35 37; 32 33 34; 35 36 37 

This is equivalent to refining the triangular mesh in the subregions nearest 
to the point charges at (0.5, 0.5) and (—0.5, —0.5) as depicted in Fig. P9.7. 
Plot the new solution obtained by running the modified program. You may 
have to change a statement of the program as follows. 

f962 = 1 (norm([x y]+[0.5 0.5])<1e-3)-(norm([x y]-[0.5 0.5])<1e-3)'; 



Figure P9.7 Refined triangular meshes. 
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9.8 PDEtool: GUI (Graphical User Interface) of MATLAB for Solving PDEs 
(a) Consider the PDE 


^ d 2 u(x, y ) _ ^ d 2 u(x,y ) d 2 u(x, y) 

dx 2 dxdy 3 y 2 

with the boundary conditions 


for 0 < x < 1, 0 < y < 1 
(P9.8.1) 


u(0,y) = ye 2y , u(l, y) = (1 + y)e 1+2y , 

(P9.8.2) 

w(x, 0) = xe*, u(x, 1) = (x + l)e x+2 
Noting that the field of coefficient c should be filled in as 

0 Elliptic | C 14 -2 -2 1 | 14 -2 1 | 
in the PDE specification dialog box and the true analytical solution is 
u(x, y) = (x + y)e x+2y (P9.8.3) 

use the PDEtool to solve this PDE and fill in Table P9.8.1 with the 
maximum absolute error and the number of nodes together with those of 
Problem 9.2(d) for comparison. 

You can refer to Example 9.8 for the procedure to get the numerical 
value of the maximum absolute error. Notice that the number of nodes is 
the number of columns of p, which is obtained by clicking ‘Export_Mesh’ 
in the Mesh pull-down menu and then, clicking the OK button in the 
Export dialog box. You can also refer to Example 9.10 for the usage 
of ‘Adaptive Mesh’, but in this case you only have to check the box 
on the left of ‘Adaptive Mode’ and click the OK button in the ‘Solve 
Parameters’ dialog box opened by clicking ‘Parameters’ in the Solve 
pull-down menu, and then the mesh is adaptively refined every time you 
click the = button in the tool-bar to get the solution. With the box on the 
left of ‘Adaptive Mode’ unchecked in the ‘Solve Parameters’ dialog box, 


Table P9.8.1 The Maximum Absolute Error and the Number of Nodes 



The Maximum 

The Number 


Absolute Error 

of Nodes 

poisson() 

1.9256 

41 x 41 

PDEtool with Initialize Mesh 

PDEtool with Refine Mesh 

PDEtool with second Refine Mesh 
PDEtool with Adaptive Mesh 

PDEtool with second Adaptive Mesh 

0.1914 

177 
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the mesh is nonadaptively refined every time you click ‘Refine Mesh’ 
in the Mesh pull-down menu. You can restore the previous mesh by 
clicking ‘Undo Mesh Change’ in the Mesh pull-down menu. 

(b) Consider the PDE 


d 2 u(x, y ) d 2 u(x, y ) 
dx 2 + dy 2 


for 0 < x < 4, 0 < y < 4 (P9.8.4) 


with the Dirichlet/Neumann boundary conditions 
n(0, y) = e y — cosy, 3n(v, y)/3x|* B 4 = —sin4 — e 4 cosy (P9.8.5) 
du(x, y)/dy\ y=0 =cosx, du(x, y)/3y| y=4 = e 4 cosv + e x sin4 (P9.8.6) 


Noting that the true analytical solution is 


u(x, y) = e y cos x — e x cosy (P9.8.7) 


use the PDEtool to solve this PDE and fill in Table P9.8.2 with the 
maximum absolute error and the number of nodes together with those of 
Problem 9.3(g) for comparison. 

(c) Consider the PDE 


2 d 2 u(x,t) _ 3 u(x,t) 
dx 2 3 1 


for 0 < v < jt, 0 < t < 0.2 (P9.8.8) 


with the initial/boundary conditions 


u(x, 0) = sin(2v), n(0, t) = 0, u(n, t) = 0 (P9.8.9) 


Noting that the true analytical solution is 

u(x, t) = sin(2v)e _8f (P9.8.10) 


Table P9.8.2 The Maximum Absolute Error and the Number of Nodes 



The Maximum 

The Number 


Absolute Error 

of Nodes 

poisson() 

0.2005 

21 x 21 

PDEtool with Initialize Mesh 

PDEtool with Refine Mesh 

PDEtool with second Refine Mesh 
PDEtool with Adaptive Mesh 

PDEtool with second Adaptive Mesh 

0.5702 

177 
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Table P9.8.3 The Maximum Absolute Error and the Number of Nodes 



The Maximum 

The Number 


Absolute Error 

of Nodes 

poisson() 

PDEtool with Initialize Mesh 

7.5462 x 10“ 4 

41 x 101 

PDEtool with Refine Mesh 

PDEtool with second Refine Mesh 




use the PDEtool to solve this PDE and fill in Table P9.8.3 with the 
maximum absolute error and the number of nodes together with those 
obtained with the MATLAB routine ‘heat_CN()’ in Problem 9.4(c) for 
comparison. In order to do this job, take the following steps. 

(1) Click the EH button in the tool-bar and click-and-drag on the 
graphic region to create a rectangular domain. Then, double¬ 
click the rectangle to open the Object dialog box and set 
the Left/Bottom/Width/Height to 0/0/pi/0.01 to make a long 
rectangular domain. 

(cf) Even if the PDEtool is originally designed to deal with only 2-D PDEs, 
we can use it to solve 1-D PDEs like (P9.8.8) by proceeding in this way. 

(2) Click the 30 button in the tool-bar, double-click the upper/lower 
boundary segments to set the homogeneous Neumann boundary con¬ 
dition (g = 0, q = 0) and double-click the left/right boundary seg¬ 
ments to set the Dirichlet boundary condition (h = 1, r = 0) as given 
by Eq. (P9.8.9). 

( 3 ) Open the PDE specification dialog box by clicking the PDE button, 
check the box on the left of ‘Parabolic’ as the type of PDE, and set 
its parameters in Eq. (9.5.5) as c = 2, a = 0, f = 0 and d = 1, which 
corresponds to Eq. (P9.8.8). 

( 4 ) Click ‘Parameters’ in the Solve pull-down menu to set the time range, 
say, as 0:0.002:0.2 and to set the initial conditions as Eq. (P9.8.9). 

( 5 ) In the Plot selection dialog box opened by clicking the ^ but¬ 
ton, check the box before Height and click the Plot button. If you 
want to plot the solution graph at a time other than the final time, 
select the time for plot from {0, 0.002, 0.004,..., 0.2} in the far- 
right field of the Plot selection dialog box and click the Plot but¬ 
ton again. 

(6) If you want to see a movie-like dynamic picture of the solution graph, 
check the box before Animation and then click the Plot button in the 
Plot selection dialog box. 

(7) Click ‘Export_Mesh’ in the Mesh pull-down menu, and then click the 
OK button in the Export dialog box to extract the mesh data {p, e, t}. 
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Also click ‘Export_Solution’ in the Solve pull-down menu, and then 
click the OK button in the Export dialog box to extract the solution 
u. Now, you can estimate how far the graphical/numerical solution 
deviates from the true solution (P9.8.10) by typing the following 
statements into the MATLAB command window: 

»x = p(1,:)'; y - p (2,: )'; %x,y coordinates of nodes in columns 
»tt = 0:0.01:0.2; %time vector in row 

»err = sin(2*x)*exp(-8*tt)-u; %deviation from true sol.(P9.8-10) 
»err_max = max(abs(err)) %maximum absolute error 


(d) Consider the PDE 

d 2 u(x, t) d 2 u(x, t) 
dx 2 = dT 2 


for Of 


with the initial/boundary conditions 
u(x, 0) = 


0 elsewhere 

u( 0, t ) = 0, u( 10, f) = 0 


Use the PDEtool to make a dynamic picture out of the solution for 
this PDE and see if the result is about the same as that obtained in 
Problem 9.6(c) in terms of the time when one of the two separated pulses 
propagating leftward is reflected and reversed and the time when the two 
separated pulses are reunited. 

(cf) Even if the PDEtool is originally designed to solve only 2-D PDEs, we can 
solve 1-D PDE like (P9.8.11) by proceeding as follows: 

( 0 ) In the PDE toolbox window, adjust the ranges of the x axis and the 
y axis to [—0.5 10.5] and [—0.01 +0.01], respectively, in the box 
opened by clicking ‘Axes_Limits’ in the Options pull-down menu. 

(1) Click the E_l button in the tool-bar and click-and-drag on the graphic 
region to create a long rectangle of domain ranging from x 0 = 0 to 
Xf = 10. Then, double-click the rectangle to open the Object dialog 
box and set the Left/Bottom/Width/Height to 0/—0.01/10/0.02. 

(2) Click the 3 £2 button in the tool-bar, double-click the upper/lower 
boundary segments to set the homogeneous Neumann boundary con¬ 
dition (g = 0, q = 0) and double-click the left/right boundary seg¬ 
ments to set the Dirichlet boundary condition (h = 1, r = 0) as given 
by Eq. (P9.8.13). 

(3) Open the PDE specification dialog box by clicking the PDE button, 
check the box on the left of ‘Hyperbolic’ as the type of PDE, and 
set its parameters in Eq. (P9.8.11) asc = 1, a = 0, f = 0 and d = 1. 
(See Fig. 9.15a.) 
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( 4 ) Click ‘Parameters’ in the Solve pull-down menu to set the time range 
to, say, as 0:0.2:10, the boundary condition as (P9.8.13) and the ini¬ 
tial conditions as (P9.8.12). (See Fig. 9.15b and Problem 9.6(c)(ii).) 

( 5 ) In the Plot selection dialog box opened by clicking the ^ button, 
check the box before ‘Height’ and the box before ‘Animation’ and 
then click the Plot button in the Plot selection dialog box to see a 
movie-like dynamic picture of the solution graph. 

(6) If you want to have better resolution in the solution graph, click Mesh 
in the top menu bar and click ‘Refine Mesh’ in the Mesh pull-down 
menu. Then, select Plot in the top menu bar or type CTRL + P( A P) 
on the keyboard and click ‘Plot_Solution’ in the Plot pull-down menu 
to see a smoother animation graph. 

(7) In order to estimate the time when one of the two separated pulses 
propagating leftward is reflected and reversed and the time when 
the two separated pulses are reunited, count the flickering frame 
numbers, noting that one flickering corresponds to 0.2 s according 
to the time range set in step (4). 

(8) If you want to save the PDEtool program, click File in the top menu 
bar, click ‘Save_As’ in the File pull-down menu, and input the file 
name of your choice. 



APPENDIX 
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Theorem A.l. Mean Value Theorem 1 . Let a function /( x) be continuous on 
the interval [ a, b] and differentiable over (a, b). Then, there exists at least one 
point £ between a and b at which 

fm = f{b) ~ f[a \ f(b) = f(a) + f($)(b-a) (A.l) 

b — a 

In other words, the curve of a continuous function f(x) has the same slope as 
the straight line connecting the two end points (a, f(a )) and (b, f(b )) of the 
curve at some point § e [a, b\, as in Fig. A.l. 



Figure A.l Mean value theorem. 


1 See the website @http://www.maths.abdn.ac.uk/~igc/testing/tch/ma2001/notes/notes.html 
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Theorem A.2. Taylor Series Theorem 1 . If a function fix) is continuous and 
its derivatives up to order (K + 1) are also continuous on an open interval D 
containing some point a, then the value of the function f(x ) at any point x e D 
can be represented by 


fix) = J2 ~ a)k + (A.2) 

k =0 

where the first term of the right-hand side is called the K th-dcgrcc Taylor poly¬ 
nomial, and the second term called the remainder (error) term is 

f(K+V)(k\ 

Rk+\(x) = — -— (x — a) K+l for some § between a and x (A.3) 

(K + 1)! 

Moreover, if the function f(x) has continuous derivatives of all orders on D, 
then the above representation becomes 


fix) 


-Z-kT**- 


which is called the (infinite) Taylor series expansion of fix) about a. 



APPENDIX 



MATRIX 

OPERATIONS/PROPERTIES 


B.1 ADDITION AND SUBTRACTION 


021 «22 

am i a mi 

Cu Cl2 
C 2 1 C 2 2 

CM1 Cm2 


C2N 

CMN 


bn b22 

bM 1 &M2 


&1N 

b2N 

bMN 


(B. 1.1) 


(B.1.2) 


B.2 MULTIPLICATION 



a n 

an 

■ a\ K 


'b n 

bn 

b\N 

AB = 

021 

«22 

aw 


&21 

b22 

b2N 


a M \ 

Um2 

■ CmK 


_b K \ 

blC2 

b]£N _ 


cn 

cn 

C\N 




= 

C21 

C22 

■ C 2 N 


= C 




_CM\ 

CM2 

■ c M n _ 
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with 


K 

Cmn = X a mkbkn 


(B.2.2) 


(cf) For this multiplication to be done, the number of columns of A must equal the 
number of rows of B. 

(cf) Note that the commutative law does not hold for the matrix multiplication, that is, 
AB / BA. 


B.3 DETERMINANT 


The determinant of a K x K (square) matrix A = [a mn \ is defined by 


K K 

det(A) = |A| = or 'Yh a mk(-\) m+kM mk (B.3.1) 

k =0 *=0 

for any fixed l < n < K or l < m < K 


where the minor M kn is the determinant of the (K — \) x (K — \) (minor) 
matrix formed by removing the £th row and the nth column from A and A kn = 
(— l) k+n M kn is called the cofactor of a kn . 

In particular, the determinants of a 2 x 2 matrix A 2x 2 and a 3 x 3 matrix 
A 3x3 are 

I I 2 

det(A 2x2 )=r n a n 12 \ = Yak n (-l) k+n M kn =a lia2 2 - a 12 a 2 i (B.3.2) 

| a 2 \ 0.22 | f —' 


det(A 3x3 ) = 


a\\ a\ 2 ai 3 
a 2 \ a 22 a 23 
a 3 i a 32 a 33 


a 22 a 23 _ a 2 1 a 23 + a 21 a 22 

«32 033 12 «31 a 33 13 «31 «32 


= a\\(a 22 a 33 — a 23 a 32 ) — ai 2 {a 2 \a 33 — a 23 a 3 1 ) + ai 3 (n 2 ia 32 — a 22 a 3 \) 


Note the following properties. 


(B.3.3) 


• If the determinant of a matrix is zero, the matrix is singular. 

• The determinant of a matrix equals the product of the eigenvalues of a 
matrix. 

• If A is upper/lower triangular having only zeros below/above the diag¬ 
onal in each column, its determinant is the product of the diagonal ele¬ 
ments. 

• det(A r ) = det(A); det(AB) = det(A)det(B); det(A-') = l/det(A) 
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B.4 EIGENVALUES AND EIGENVECTORS OF A MATRIX 2 

The eigenvalue or characteristic value and its corresponding eigenvector or char¬ 
acteristic vector of an IV x N matrix A are defined to be a scalar X and a nonzero 
vector v satisfying 


Av = A.v (A — XI)\ = 0 (v/0) (B.4.1) 

where (X, v) is called an eigenpair and there are N eigenpairs for an N x N 
matrix A. 

The eigenvalues of a matrix can be computed as the roots of the characteristic 
equation 

| A — XI\ = 0 (B.4.2) 

and the eigenvector corresponding to an eigenvalue A., can be obtained by sub¬ 
stituting Xi into Eq. (B.4.1) and solve it for v. 

Note the following properties. 

• If A is symmetric, all the eigenvalues are real-valued. 

• If A is symmetric and positive definite, all the eigenvalues are real and 
positive. 

• If v is an eigenvector of A, so is cv for any nonzero scalar c. 


B.5 INVERSE MATRIX 


The inverse matrix of a K x K (square) matrix A = [a mn \ is denoted by A -1 
and defined to be a matrix which is premultiplied/postmultiplied by A to form 
an identity matrix—that is, satisfies 

A x A- 1 = A" 1 x A = / (B.5.1) 


An element of the inverse matrix A 1 = \a mn \ can be computed as 


det(A) 


(B.5.2) 


where M k „ is the minor of a kn and A kn = (—1 ) k+n M kn is the cofactor 
of a kn . 


2 See the website @http://www.sosmath.com/index.html or http://www.psc.edu/~burkardt/papers/ 
linear_glossary.html.) 
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Note that a square matrix A is invertible/nonsingular if and only if 

• No eigenvalue of A is zero, or equivalently, 

• The rows (and the columns) of A are linearly independent, or equivalently, 

• The determinant of A is nonzero. 


B.6 SYMMETRIC/HERMITIAN MATRIX 

A square matrix A is said to be symmetric if it is equal to its transpose, that is, 

A t = A (B.6.1) 

A complex-valued matrix is said to be Hermitian if it is equal to its complex 
conjugate transpose, that is, 

A = A* t where * means the conjugate. (B.6.2) 

Note the following properties of a symmetric/Hermitian matrix. 

• All the eigenvalues are real. 

• If all the eigenvalues are distinct, the eigenvectors can form an orthogo¬ 
nal/unitary matrix U. 


B.7 ORTHOGONAL/UNITARY MATRIX 

A nonsingular (square) matrix A is said to be orthogonal if its transpose is equal 
to its inverse, that is, 

A 7 A = 1, A 7 = A- 1 (B.7.1) 

A complex-valued (square) matrix is said to be unitary if its conjugate transpose 
is equal to its inverse, that is, 

A* 7 A = /, A* 7 = A~ x (B.7.2) 

Note the following properties of an orthogonal/unitary matrix. 

• The magnitude (absolute value) of every eigenvalue is one. 

• The product of two orthogonal matrices is also orthogonal; (AB)* 7 (AB) = 
B* T (A* T A)B = I. 
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B.8 PERMUTATION MATRIX 

A matrix P having only one nonzero element of value 1 in each row and column 
is called a permutation matrix and has the following properties. 

• Premultiplication/postmultiplication of a matrix A by a permutation matrix 
P (i.e., PA or A P) yields the row/column change of the matrix A, respec¬ 
tively. 

• A permutation matrix A is orthogonal, that is, A T A = I. 


B.9 RANK 

The rank of an M x N matrix is the number of linearly independent 
rows/columns and if it equals min(M, N), then the matrix is said to be of 
maximal or full rank; otherwise, the matrix is said to be rank-deficient or to 
have rank-deficiency. 


B.10 ROW SPACE AND NULL SPACE 

The row space of an M x N matrix A, denoted by 7 Z(A), is the space spanned 
by the row vectors—that is, the set of all possible linear combinations of row 
vectors of A that can be expressed by A T a with an M-dimensional column vector 
a. On the other hand, the null space of the matrix A, denoted by Af(A), is the 
space orthogonal (perpendicular) to the row space—that is, the set of all possible 
linear combinations of the TV-dimensional vectors satisfying Ax = 0. 


B.11 ROW ECHELON FORM 

A matrix is said to be of row echelon form if 

• Each nonzero row having at least one nonzero element has a 1 as its first 
nonzero element. 

• The leading 1 in a row is in a column to the right of the leading 1 in the 
upper row. 

• All-zero rows are below the rows that have at least one nonzero element. 

A matrix is said to be of reduced row echelon form if it satisfies the above 
conditions and, additionally, each column containing a leading 1 has no other 
nonzero elements. 
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Any matrix, singular or rectangular, can be transformed into this form through 
the Gaussian elimination procedure (i.e., a series of elementary row operations) 
or, equivalently, by using the MATLAB built-in routine “rref ()”. For example, 
we have 


3' 

-8 


_i J change 

2 0 -4" 
0 1 3 

0 1 3 


'2 4 0 
0 0 1 
1 2 1 


subtracQon 


- 8 " 

3 

- 1 _ 

1 2 
0 0 
0 0 


0 

1 

0 


-4' 

3 

0 


= rref(A) 


Once this form is obtained, it is easy to compute the rank, the determinant and 
the inverse of the matrix, if only the matrix is invertible. 


B.12 POSITIVE DEFINITENESS 

A square matrix A is said to be positive definite if 

x* T Ax > 0 for any nonzero vector x (B.12.1) 

A square matrix A is said to be positive semidefinite if 

x* T Ax > 0 for any nonzero vector x (B.12.2) 

Note the following properties of a positive definite matrix A. 

• A is nonsingular and all of its eigenvalues are positive. 

• The inverse of A is also positive definite. 

There are similar definitions for negative definiteness and negative semidefinite- 

Note the following property, which can be used to determine if a matrix 
is positive (semi-) definite or not. A square matrix is positive definite if and 
only if: 

(i) Every diagonal element is positive. 

(ii) Every leading principal minor matrix has positive determinant. 

On the other hand, a square matrix is positive semidefinite if and only if: 

(i) Every diagonal element is nonnegative. 

(ii) Every principal minor matrix has nonnegative determinant. 
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Note also that the principal minor matrices are the submatrices taking the diagonal 
elements from the diagonal of the matrix A and, say for a 3 x 3 matrix, the 
principal minor matrices are 

an an an 
an a 22 a 23 
a 3 i a 32 a 33 _ 

among which the leading ones are 


an, a 22 , a 33 , 


an a u a 22 a 2 3 
_a 2 \ a 22 _ ’ _o 32 a 33 _ 


an. 


an 

an 


an 
fl 2 2 _ 


an an «i3 
an a 22 a 23 

«31 a 32 a 33 


B.13 SCALAR (DOT) PRODUCT AND VECTOR (CROSS) PRODUCT 

A scalar product of two IV-dimensional vectors x and y is denoted by x ■ y and 
is defined by 

xy = J2 x »y n = x7 y (B. 13.1) 


An outer product of two three-dimensional column vectors x = [x 3 x 2 x 3 ] T and 
y = [yj y 2 y 3 ] r is denoted by x x y and is defined by 


*2y3 - -*3}’2 

x 3 yi - xiy 3 

x 1 y 2 -x 2 y 1 


(B.13.2) 


B.14 MATRIX INVERSION LEMMA 


Matrix Inversion Lemma. Let A, C, and [C 1 + DA 1 B\ be well-defined with 
nonsingularity as well as compatible dimensions. Then we have 

[A + BCD r 1 = A- 1 - A l B[C~ l + DA 1 B] 1 DA 1 (B.14.1) 

Proof. We will show that postmultiplying Eq. (B.14.1) by [A + BCD ] yields an 
identity matrix. 

[A -1 - A~ x B\C~ l + DA- l B] l DA~ l ][A + BCD] 

= 1 + A 1 BCD - A~ l B[C~ l + DA~ 1 B]~ 1 D 
- A~ l B[C~ l + DA- 1 B]- 1 DA- 1 BCD 
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= I + A~ l BCD - A~ l B\C~ x + DA~ l BY l C~ l CD 
- A X B[C~ X + DA x BY l DA x BCD 
= 1 + A 1 BCD - A l B[C~ l + DA~ l BY l [C~ l + DA X B]CD 
= I + A~ l BCD - A 1 BCD == I 



APPENDIX 


DIFFERENTIATION WITH 
RESPECT TO A VECTOR 


The first derivative of a scalar-valued function /(x) with respect to a vector 
x = [x\ X2Y is called the gradient of /(x) and defined as 


V/(x) 


df/dx 1 ' 
df/dx 2_ 


(C.l) 


Based on this definition, we can write the following equation. 



3 T 3 T 3 f v 

S* J= S JX= 8l (;v ‘ + * 2 » , = l y 


(C.2) 


±A=A(4 + ^, = 2[«] = 2, 


(C.3) 

Also with ar 

1 M x N matrix A, we have 




3 T 3 T T 

—x Ay = —y A x = Ay 

3x J 3x J J 


(C.4a) 


3 T 3 T T T 

—y t Ax = —x t A t y = A r y 
3x J 3x J J 


(C.4b) 

where 

xr ^ = EEw» 


(C.5) 


Applied. Numerical Methods Using MATLAB ®, by Yang, Cao, Chung, and Morris 
Copyright © 2005 John Wiley & Sons, Inc., ISBN 0-471-69833-4 


471 




472 Dl FFERENTIATION WITH RESPECT TO A VECTOR 


Especially for a square, symmetric matrix A with M = N, we have 


9 _ _ if A is symmetric 

— x t Ax = (A + A T )x -> 

9x 


2 Ax 


(C.6) 


The second derivative of a scalar function /(x) with respect to a vector x = 
[x\ xj\ T is called the Hessian of /(x) and is defined as 


d 2 f/dx 2 

d 2 f/dx 2 dx\ 


d 2 f/dx\dx 2 
9 2 //9x| 


(C.7) 


Based on this definition, we can write the following equation: 

d 2 t _ if A is symmetric 

— -x t Ax = A + A t - » 2A 

dx 2 


(C.8) 


On the other hand, the first derivative of a vector-valued function f(x) with 
respect to a vector x = [*1 x 2 ] T is called the Jacobian of /(x) and is defined as 


J(x) = -j-f(x) = 
dx 


'9/i/9Jd 

,9/ 2 /9*i 


9 / 1 /9*2 " 
9 / 2 / 9 * 2 _ 


(C.9) 
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LAPLACE TRANSFORM 


Table D.1 Laplace Transforms of Basic Functions 


X(t) 

X(s) 

x(t) 

*(.v) 

x(t) X(s) 

(i) m 

1 (5) 

e~ a, u s (t) 


( 9 ) e~ al sin mt u s (t) ” | 

(s + a) 2 + or 

(2) 5(f - h) 

e~ hs (6) 

t m e a, u s {t) 

m! 

(10) c _aI co'’ctrf u (t) S ° 






(s + a) m+1 

(s + a) 2 + co 2 

(3) u s (t) 

\ (7) 

smurf u s (t) 

s 2 + co 2 


(4) t m u s (t) 


cos 

s 2 + co 2 
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LAPLACE TRANSFORM 


Table D.2 Properties of Laplace Transform 


(0) Definition 

XQt) = L{x{t)} = j x(t)e~*dt 

(1) Linearity 

ax(1) + px(t) aX(.s) + 0Y(s) 

(2) Time shifting 

x{t -h)u s (f -h),h >0 -f e~ sh |*(s) + J x(r)e“”£fr| 

(3) Frequency shifting 

e si, x(t) -+ X(s - s i) 

(4) Real convolution 

g(t)*x(t)-> G(s)X(sf 

(5) Time derivative 

x'(t) -+ sX(s) - x(0) 

(6) Time integral 

J x(t) dr -»• + J J x(T)dT 

(7) Complex derivative 

txty**~X($i 

(8) Complex convolution 

x(t)y(0 -> \ ° °° X(w)y(5 - v) dv 

2ttj J a „_oo 

(9) Initial value theorem 

x(0) -*■ hiX(j) 

(10) Final value theorem 

x(oo) —> lim^X(s) 
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FOURIER TRANSFORM 


Table E.1 Properties of CtFT (Continuous-Time Fourier Transform) 


(0) Definition 

X(co) = F{x{t)} = J°° x(t)e- JM dt 

(1) Linearity 

ctxit) + fix(t) > c tX{w) + pY(co) 

(2) Symmetry 

x{t ) = x e {t) +x 0 (t): real -* X(co) = X*(-a>) 
x e (t ): real and even t* X e (co) = Re{X(oj)} 
x a (t ): real and odd -f- X 0 {o>) = j\m{X(a))} 
x(-t) -> X(-co) 

(3) Time shifting 

x(t - fi) -* e-'“T(®) 

(4) Frequency shifting 

e^'xdi X(co - m) 

(5) Real convolution 

git) * xit) = j°° g{T)x(t -r)dt-+ G(co)X(co) 

(6) Time derivative 

x'(t) -* ja>X((o) 

(7) Time integral 

f x(x) dr -* —X(a>) + ?rY(0)5(w) 

J-oc j<0 

(8) Complex derivative 

t x(t ) -* 

(9) Complex convolution 

x (t)y(t) -* ^X(tu) * 

(10) Scaling 

x(al ) -* W«) 

(11) Duality 

git) > f(co) fit) -> 27tg(co) 

(12) Parseval’s relation 

J°° \x(t)\ 2 dt^ P \xiw)\ 2 da> 
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Table E.2 Properties of 0 

(0) Definition 

(1) Linearity 

(2) Symmetry 

(3) Time shifting 

(4) Frequency shifting 

(5) Real convolution 

(6) Complex derivative 

(7) Complex convolution 

(8) Scaling 


(Discrete-Time Fourier Transform) 


X(Q) = J2 xln]e- jan 

<xx[n ] + f}x[n] aX(Q) + 0Y(G) 

x\n\ = x e [n]+x 0 [n]: real X(fi) ss X*(-fi) 

x e [n]: real and even X e (Q) = Re{X(S2)} 

Xo \n\. real and odd -+ X 0 (Q) = /lm(X(Q)} 

x\~n] -> X(-a) 

x\n - n,] e- jan 'X(Q) 

e^' n x\n] —>■ X(£2 - fi x ) 

g[n]*x[n] = jr g[mMn - m] G(Q)X(S2) 

nx[n]^ j-^-X(Cl) 
dll 

x[n]y[n] ->• —X(£2) * Y (£2) (periodic/circular convolution) 
2n 

| x[n/M] if n = mM(m : an integer) X (Mll) 

\ 0, otherwise 

J2 lx[n]l2 = ^l_ i*( n )i 2rfn 


(9) Parseval’s relation 
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USEFUL FORMULAS 


E 

E 

E 


Formulas for Summation of Finite Number of Terms 


;- 1 (F.3) 

_ N{N + 1)(W + 2) 


E>"= a 

X> 2 = 


1 - (N + 1)0^ + Na N+1 
(1 — a) 2 

N(N + l)(2N + 1) 


(a + b) N = V NC n a N ~ n b n with JVC„ = NC N - n = —p = —-— 

" n! (N — n)\n\ 

Formulas for Summation of Infinite Number of Terms 
£*"= 1 % lxl<1(R7) 1x1 
E = ]if 0 ( -| 
f ±^L = 1 _I + I_I + ... = I W 

^ 2 " + 1 3 5 7 4 


(F.2) 

(F.4) 

(F.5) 

(F.6) 

(F.8) 

(F.9) 

(F.10) 


(continued overleaf) 
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V"' 1 1 1 1 

E^ = 1 + l 2+52 + 42+--- 

, A 1 „ , 1 1 , 

e = E„: a ' = 1+ TT x + 2 ! x ■ 


ln(l±*) = -£(±l)"-*' 1 = ±x--; C 2 ±-x 3 -...,|x| < 1 
(-1)" ..2n+l .. 1 3^ 1 5 1 7 ^ 


X =E 


(2n+l)\ 

(-1)" 2 n 

<2«i! ' 




T (-1)" 


3 


4! 6! 

W< | 


n + 1 357 

Trigonometric Formulas 

sin(A ± B) = sin A cos B ± cos A sin B (F.19) tan(A ± B) = j 

cos (A ± B) = cos A cos B T sin A sin B (F.20) 

sin A sin B= ^{cos(A — B) — cos(A + B)) 

sin A cos B = j {sin(A + B) + sin(A — B)) 

cos A sin B = ^ {sin(A + B) — sin(A — B)) 

cos A cos B = i{cos(A + B) + cos(A - B)) 

— 

a cos A — b sin A = s/a 2 + b 2 cos (A + 9), 9 = tan' 
a sin A + b cos A = si a 2 + b 2 sin(A + 9),0 = tan" 
sin 2 A= J(l — cos2A) (F.30) 


'© 

© 


s 2 A= -(1 + cos2A) 


(F.ll) 

(F.12) 

(F.13) 

(F.14) 

(F.15) 

(F.16) 

(F.17) 

(F.18) 

(F.21) 

(F.22) 

(F.23) 

(F.24) 

(F.25) 

(F.26) 

(F.27) 

(F.28) 

(F.29) 

(F.31) 
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sin 3 A = - (3 
4 

sin A - sin 3 A) (F.32) 

cos 3 A = - (3 cos A + cos 3A) 

4 

(F.33) 

sin 2 A = 2 sin 

A cos A 

(F.34) 

sin 3A = 3 sin A — 4 sin 3 A 

(F.35) 

cos 2 A = cos 2 

A — sin 2 A = 

1 -2sin 2 A = 2co 

s 2 A — 1 

(F.36) 

cos 3 A = 4 co 




(F.37) 

a b 

sin A sin b 

sinC 

(F.38) 

e ±ie = cos 9 ± j sin 9 

(F.40) 

a 2 = b 2 + c 2 - 

- 2 be cos A 

(F.39a) 

sin# = -r^(e JS - e~ ie ) 

(F.41a) 

b 2 = c 2 +a 2 - 

- 2ca cos B 

(F.39b) 

cos 9= + 

(F.41b) 

c 2 = a 2 +b 2 - 

- 2ab cos C 

(F.39c) 

1 _ e -j« 

tan# =-^- - 

j eJ<> + 

(F.41c) 
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SYMBOLIC COMPUTATION 


G.1 HOW TO DECLARE SYMBOLIC VARIABLES AND HANDLE 
SYMBOLIC EXPRESSIONS 

To declare any variable(s) as a symbolic variable, you should use the sym or 
syms command as below. 

»a = sym('a'); t = sym('t'); x = sym('x'); 

»syms a x y t %or, equivalently and more efficiently 

Once the variables have been declared as symbolic, they can be used in expres¬ 
sions and as arguments to many functions without being evaluated as numeric. 

»f = x'2/(1 + tan(x)'2); 

»ezplot(f, -pi, pi) 

»simplify(cos(x) A 2+sin(x)"2) %simplify an expression 

»simplify(cos(x) A 2 - sin(x) A 2) %simplify an expression 

»simple(cos(x) *2 - sin(x)'2) %simple expression 

»simple(cos(x) + i*sin(x)) %simple expression 
ans = exp(i*x) 

»eq1 = expand((x + y) A 3 - (x + y)"2) %expand 

»collect(eq1,y) %collect similar terms in descending order with respect to y 
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»factor(eq1) %factorize 

ans = (x + y - 1)*(x + y)*2 
»horner(eq1) %nested multiplication form 

ans = (-1 + y)*y~2 + ((- 2 + 3*y)*y + (-1 + 3*y + x)*x)*x 
»pretty(ans) %pretty form 
2 

(-1 + y) y + ((-2 + 3 y) y + (-1 + 3 y + x) x) x 

If you need to substitute numeric values or other expressions for some sym¬ 
bolic variables in an expression, you can use the subs function as below. 


»subs(eq1 ,x,0) %substitute numeric value 
ans = -y~2 + y*3 

»subs(eq1,{x,y},{0,x - 1}) %substitute numeric values 
ans = (x - 1)^3 - (x - 1)^2 

The sym command allows you to declare symbolic real variables by using the 
‘real’ option as illustrated below. 

»x = sym('x','real'); y = sym('y','real'); 

»syms x y real %or, equivalently 

»z = x + i*y; %declare z as a symbolic complex variable 
»conj(z) %complex conjugate 
ans = x - i*y 
»abs(z) 

ans = (x"2 + y"2)*(1/2) %equivalently 

The sym function can be used to convert numeric values into their symbolic 
expressions. 

»sym(1/2) + 0.2 

ans = 7/10 %symbolic expression 


On the other hand, the double command converts symbolic expressions into 
their numeric (double-precision floating-point) values and the vpa command finds 
the variable-precision arithmetic (VPA) expression (as a symbolic representation) 
of a numeric or symbolic expression with d significant decimal digits, where d 
is the current setting of DIGITS that can be set by the digits command. Note 
that the output of the vpa command is a symbolic expression even if it may look 
like a numeric value. Let us see some examples. 

»f = sym('exp(i*pi/4) 1 ) 
f = exp(i*pi/4) 

»double(f) 

ans = 0.7071 + 0.7071i %numeric value 
»vpa(ans,2) 

ans = .71 + .71*i %symbolic expression with 2 significant digits 
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G.2 CALCULUS 


G.2.1 Symbolic Summation 

We can use the symsum() function to obtain the sum of an indefinite/definite 
series as below. 


»>syms x n N %declare x,n,N as 
>>simple(symsum(n,0,N)) 

ans = 1/2*N*(N + 1) %£L>» 


symbolic variables 
N(N + 1) 

2 


'Simple(symsum(n~2,0,N)) 
ans = 1/6*N*(N + 1)*(2*N + 1) 0 n 2 


N(N + 1)(2N + 1) 
6 


»symsum(1 /n"2,1, inf)) 

N 1 7T^ 

ans = Il/6*pi'2 %£„ = 0^ = — 

»symsum(x"n,n>0,inf)) 

ans = -1 /(-1 + x) %^_o x" = -- under the assumption that |x| 


G.2.2 Limits 

We can use the limit () function to get the (two-sided) limit and the right/left¬ 
sided limits of a function as below. 


»syms h n x 

»limit(sin(x)/x,x,0) % lim — 


»limit(x/abs(x) ,x,0, 1 right 1 ) % li 1 ^ -j—j- = 1 

»limit(x/abs(x) ,x,0, ' left') % lim — = -1 

x^o- \x\ 


»limit(x/abs(x) ,x,0) % lim —- =? 
ans = NaN %Not a Number 

»limit( (cos(x+h) -cos(x) )/h,h,0) % lim 


cos(x + h) — cos(x) d 


»limit( (1 + x/n)'n,n,inf) % lim ^1 + -^ = e 

ans = exp(x) 


G.2.3 Differentiation 

The dif f () function differentiates a symbolic expression w.r.t. the variable given 
as one of its 2 nd or 3 rd input arguments or its free variable which might be 
determined by using the f indsym function. 
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»syms a b x n t 
»diff (x*n)) 
ans = x"n*n/x 
»simplify(ans) 

»f = exp(a*x)*cos(b*t) 

»diff(f) %equivalently diff(f,x) 

ans = a*exp(a*x)*cos(b*t) %— / = — e ax cos (bt) = ae ax cos (bt) 
dx dx 

»diff (f ,t) 

ans = -exp(a*x)*sin(b*t) *b %—/ = — e ax cos (bt) = - be ax sin (bt) 

»diff(f,2) %equivalently diff(f,x,2) 

d 2 , 

ans = a"2*exp(a*x)*cos(b*t) %— -f = a 2 e“cos (bt) 
dx 1 

»diff (f ,t,2) 

ans = -exp(a*x)*cos(b*t)*b"2 / = -e ax cos (bt)b 2 

»g = [cos(x)*cos(t) cos(x)*sin(t) ]; 

»jacob_g = jacobian(g, [x t]) 

jacob_g = [ -sin(x)*cos(t), -cos(x)*sin(t)] 

[ -sin(x)*sin(t), cos(x)*cos(t)] 


Note that the j acobian () function finds the jacobian defined by (C.9)—that is, 
the derivative of a vector function [ g\ g 2 \ T with respect to a vector variable 
[x t ] T —as 


J = 


dgi/dx 

dg 2 /dx 


dgi/dt 

dg 2 /dt_ 


(G.l) 


G.2.4 Integration 

The int() function returns the indefinite/definite integral (anti-derivative) of a 
function or an expression with respect to the variable given as its second input 
argument or its free variable which might be determined by using the f indsym 
function. 


»syms a x y t 



»int (1 / (1 + x~2)) 

ans = atan(x) %/ - - - dx = tan - 1 x 


»int(a~x) %equivalently diff(f,x,2) 

ans = 1/log(a)*a“x %/V dx = -- a x 

J log a 

»int(sin(a*t),0,pi) %equivalently int(sin(a*t),t,0,pi) 

ans = -cos(pi*a)/a + 1/a %/„" sin («/) <* = - ^ cos (at) | ^ cos (a7r ) + ^ 
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»int(exp(-(x - a)~2),a,inf) %equivalently int(exp(-(x - a)”2),x,0,inf) 
ans = 1/2*pi-(1/2) % f a °°e-«-« 2 dx = f 0 °° e ~ dx = IjfF 

G.2.5 Taylor Series Expansion 

We can use the taylor() function to find the Taylor series expansion of a 
function or an expression with respect to the variable given as its second or 
third input argument or its free variable that might be determined by using the 
findsym function. 

One may put ‘help taylor’ into the MATLAB command window to see its 
usage, which is restated below. Let us try applying it. 

»syms x t; N = 3; 

»TxO = taylor(exp(-x) ,N + 1) %/(x) = E» = o ~ 7/ w (0) x" 

TxO = 1-x + 1/2*x"2 - 1/6*x~3 

»sym2poly(TxO) %extract the coefficients of Taylor series polynomial 
ans = -0.1667 0.5000 -1.0000 1.0000 

»xo = 1; Txl = taylor(exp(-x) ,N + 1,xo) %/« = £" =0 ^f <n> (xo) (x - to)" 

Txl - exp(-l) - exp(-1)*(x - 1) + 1/2*exp(-1)*(x - 1 "2 - 1/6*exp(-1)*(x - 1)~3 

»pretty(Tx1) 

2 3 

exp(-1) -exp(-1)(x - 1) +1/2 exp(-1)(x - 1) -1/6 exp(-1)(x - 1) 

»f = exp( -x)*sin(t); 

»Tt = taylor(f,N + 1,t) %/(t) = EH = 0 ~jf (n> (0)t n 
Tt = exp(-x)*t - 1/6*exp(-x)*t"3 


• taylor (f) gives the fifth-order Maclaurin series expansion of f. 

• taylor(f,n+1) with an integer n > 0 gives the nth-order Maclaurin series 
expansion of f. 

• taylor(f,a) with a real number (a) gives the fifth-order Taylor series 
expansion of f about a. 

• taylor (f, n + 1, a) gives the nth-order Taylor series expansion of f about 
default_variable=a. 

• taylor(f ,n + 1 ,a,y) gives the nth-order Taylor series expansion of f (y) 
about y = a. 

(cf) The target function f must be a legitimate expression given directly as the first 
input argument. 

(cf) Before using the command “taylor()”, one should declare the arguments of the 
function as symbols by putting, say, “syms x t”. 

(cf) In case the function has several arguments, it is a good practice to put the inde¬ 
pendent variable as the last input argument of “taylor () ”, though taylor () takes 



486 SYMBOLIC COMPUTATION 


one closest (alphabetically) to ‘x’ as the independent variable by default only if 
it has been declared as a symbolic variable and is contained as an input argument 
of the function f. 

(cf) One should use the MATLAB command “sym2poly ()” if he wants to extract the 
coefficients from the Taylor series expansion obtained as a symbolic expression. 


G.3 LINEAR ALGEBRA 


Several MATLAB commands and functions can be used to manipulate the vec¬ 
tors or matrices consisting of symbolic expressions as well as those consisting 
of numerics. 


>syms all a12 a21 a22 
>A = [all a12; a21 a22]; 

>det(A) 

ans = a11*a22 - a12*a21 
>AI = A" - 1 

AI = [ a22/(a11*a22 - a12*a21), -a12/(al 1*a22 - a12*a21)] 

[ -a21/(a11*a22 - a12*a21), a11/(a11*a22 - a12*a21)] 

>A*AI 

ans = [ a11*a22/(a11*a22 - a12*a21)-a12*a21/(al1*a22 - a12*a21), 0] 

[ 0, a11*a22/(a11*a22 - a12*a21) - a12*a21/(al1*a22 - a12*a21)] 

>simplify(ans) %simplify an expression 
ans = [ 1, 0] 

[0, 1] 

>syms x t; 

>G = [cos(t) sin(t); -sin(t) cos(t)] %The Givens transformation matrix 
G=[ cos(t), sin(t)] 

[ -sin(t), cos(t)] 

>det(G), simple(ans) 
ans = cos(t)-2 + sin(t) A 2 


»G2 = G“2, simple(G2) 

G2 = [ cos(t)~2 - sin(t) A 2, 2*cos(t)*sin(t)] 

[ -2*cos(t)*sin(t), cos(t) A 2 - sin(t)~2] 
ans = [ cos(2*t), sin(2*t)] 

[ -sin(2*t), cos(2*t)] 

»GTG = G.'*G, simple(GTG) 

GTG = [ cos(t)~2 + sin(t) “2, 0] 

[ 0, cos(t) A 2 + sin(t)"2] 

ans = [ 1, 0] 

[ 0, 1] 

»simple(G~ - 1) %inv(G) for the inverse of Givens transformation matrix 
G = [ cos(t), -sin(t)] 

[ sin(t), cos(t)] 

»syms b c 
»A =[01; -c -b]; 

»[V,E] = eig(A) 

V = [ -(1/2*b + 1/2*(b-2 - 4*c) A (1/2))/c, -(1/2*b - 1/2*(b A 2 - 4*0) A (1 12)) /c] 
[ 1, 1] 
E = [ -1/2*b + 1/2*(b A 2 - 4*c) A (1/2), 0] 

[ 0, -1/2*b - 1/2*(b A 2 - 4*0) A (1/2)] 

» solve(poly(A))%another way to get eigenvalues(characteristic roots) 
ans = [ -1/2*b+1/2*(b~2 - 4*c) A (1/2)] 

[ -1/2*b-1/2*(b A 2 - 4*c)~(1/2)] 
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Besides, other MATLAB functions such as jondan(A) and svd(A) can be 
used to get the Jordan canonical form together with the corresponding similarity 
transformation matrix and the singular value decomposition of a symbolic matrix. 


G.4 SOLVING ALGEBRAIC EQUATIONS 

We can use the backslash (\) operator to solve a set of linear equations written 
in a matrix-vector form. 

»syms R11 R12 R21 R22 bl b2 

»R = [ R11 R12; R21 R22 ] ; b = [bl; b2] ; 

»x = R\b 

X = [ (R12*b2 - bl*R22) /(- R11*R22 + R21*R12) ] 

[ (-R11*b2 + R21*b1)/(-R11*R22 + R21*R12)] 

We can also use the MATLAB function solve () to solve symbolic algebraic 
equations. 

»syms a b c x 
»fx = a*x~2+b*x+c; 

»solve(fx) %formula for roots of 2 nd -order polynomial eq 
ans = [ 1 /2/a*(-b + (b-2 - 4*a*c)"(1/2))] 

[ 1 /2/a*(-b - (b~2 - 4*a*c)-(1/2))] 

»syms xl x2 bl b2 

»fx1 = xl + x2 - bl; fx2 = xl + 2*x2 - b2; %a system of simultaneous algebraic eq. 
»[x1o,x2o] = solve(fx1 ,fx2) % 

x2o = -bl + b2 


G.5 SOLVING DIFFERENTIAL EQUATIONS 


We can use the MATLAB function dsolve() to solve symbolic differential 
equations. 


xo = dsolve('Dx + a*x = O') % a differential eq.(d.e.) w/o initial condition 
xo = exp(-a*t)*C1 % a solution with undetermined constant 
xo = dsolve('Dx + a*x = O', 1 x(0) =2') % a d.e. with initial condition 
xo = 2*exp(-a*t) % a solution with undetermined constant 
xo = dsolve('Dx=1+x-2') % a differential eq. w/o initial condition 
xo = tan(t - Cl) % a solution with undetermined constant 
xo = dsolve('Dx = 1 + x-2','x(0) = 1') % with the initial condition 
xo = tan(t + 1/4*pi) % a solution with determined constant 
yo = dsolve('D2u = -u'.'t') % a 2 nd -order d.e. without initial condition 
yo = C1*sin(t) + C2*cos(t) 

xo = dsolve('D2u = -u 1 , 1 u(0) = 1,Du(0) = O', 1 1 1 ) % with the initial condition 
xo = cos(t)) 

yo = dsolve('(Dy)"2 + y-2 = 1','y(0) = 0','x 1 ) % a 1 s, -order nonlinear d.e.(nlde) 
yo = [ sin(x)] %two solutions 
[ -sin(x)] 

yo = dsolve('D2y = cos(2*x) - y 1 ,'y(O) = 1,Dy(0) = O','x') % a 2” d -order nlde 
yo = 4/3*cos(x) - 2/3*oos(x)-2 + 1/3 
S = dsolve( 1 Df=3*f + 4*g 1 , 1 Dg=-4*f + 3*g'); 
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>f = S.f, g = S.g 

f = exp(3*t)*(C1*sin(4*t) + C2*cos(4*t)) 

g = exp(3*t)*(C1*cos(4*t) - C2*sin(4*t)) 

>[f,g] = dsolve('Df = 3*f + 4*g,Dg = -4*f + 3*g','f(0) = 0,g(0) = 1') 
f = exp(3*t)*sin(4*t) 

g = exp(3*t)*cos(4*t) 
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SPARSE MATRICES 


A matrix is said to be sparse if it has a large portion of zero elements. MATLAB 
has some built-in functions/routines that enable us to exploit the sparsity of a 
matrix for computational efficiency. 

The MATLAB routine sparse () can be used to convert a (regular) matrix 
into a sparse form by squeezing out any zero elements and to generate a sparse 
matrix having the elements of a vector given together with the row/column index 
vectors. On the other hand, the MATLAB routine f ull() can be used to convert 
a matrix of sparse form into a regular one. 

»row_index =[1 1 2 3 4]; col_index = [1 2 2 3 4]; elements = [1 2 3 4 5]; 

»m = 4; n = 4; As = sparse(row_index,col_index,elements,m,n) 

As = (1,1) 1 

(1,2) 2 

(2.2) 3 

(3.3) 4 

(4.4) 5 
»Af = full (As) 

Af = 1 2 0 0 

0 3 0 0 

0 0 4 0 

0 0 0 5 

We can use the MATLAB routine sprandn(m,n,nzd) to generate an m x n 
sparse matrix having the given non-zero density nzd. Let us see how efficient 
the operations can be on the matrices in sparse forms. 

»As = sprandn(10,10,0.2); %a sparse matrix and 
»Af = full(As); its full version 
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»flops(0), AsA = As*As; flops %in sparse forms 
ans = 50 

»flops(0), AfA = Af*Af; flops %in full(regular) forms 
ans = 2000 

»b = ones(10,1); flops(O), x = As\b; flops 
ans = 160 

»flops(0), x = Af\b; flops 
ans = 592 

»flops(0), inv(As); flops 
ans = 207 

»flops(0), inv(Af); flops 
ans = 592 

»flops(0), [L,U,P] = lu(As); flops 
ans = 53 

»flops(0) , [L,U,P] = lu(Af); flops 
ans = 92 

Additionally, the MATLAB routine speye(n) is used to generate an n x n 
identity matrix and the MATLAB routine spy(n) is used to visualize the sparsity 
pattern. The computational efficiency of LU factorization can be upgraded if 
one pre-orders the sparse matrix by the symmetric minimum degree permutation, 
which is cast into the MATLAB routine symmmd (). 

Interest readers are welcome to run the following program “do_sparse” to 
figure out the functions of several sparsity-related MATLAB routines. 


%do_sparse 
clear, elf 

%create a sparse mxn random matrix 
m = 4; n = 5; A1 = sprandn(m,n,.2) 

%create a sparse symmetric nxn random matrix with non-zero density nzd 
nzd = 0.2; A2 = sprandsym(n,nzd) 

%create a sparse symmetric random nxn matrix with condition number r 
r = 0.1; A3 = sprandsym(n,nzd,r) 

%a sparse symmetric random nxn matrix with the set of eigenvalues eigs 

eigs = [0.1 0.2 .3 .4 .5]; A4=sprandsym(n,nzd,eigs) 

eig(A4) 

tic, A1A = A1*A1 1 , time_sparse = toe 

Alf = full(AI); tic, AlAf = A1f*A1f'; time_full = toe 

spy(A1A), full(AIA), AlAf 

sparse(AlAf) 

n = 10; A5 = sprandsym(n,nzd) 

tic, [L,U,P] = lu(A5); time_lu = toe 

tic, [L,U,P] = lu(full(A5)); time_full = toe 

mdo = symmmd(A5); %symmetric minimum degree permutation 

tic, [L,U,P] = lu(A5(mdo,mdo)); time_md=toc 


(cf) The command ‘flops’ is not available in MATLAB of version 6.x and that is why we 
use ‘tic’ and ‘toe’ to count the process time instead of the number of floating-point 
operations. 




APPENDIX 


MATLAB 


First of all, the following should be noted: 

1. The index of an array in MATLAB starts from 1, not 0. 

2. A dot(.) must be put before an operator to make a termwise (element-by- 
element) operation. 

Some of useful MATLAB commands are listed in Table 1.1. 

Table 1.1 Commonly Used Commands and Functions in MATLAB 


break 

fprintf 

keyboard 

return 

load *** x y 


General Commands 
to exit from a for or while loop 
fprintf(‘\n x(%d) = %6.4f \a’,ind,x(ind)) 
stop execution until the user types any key 
terminate a routine and go back to the calling routine 
read the values of x and y from the MATLAB file 


read the value(s) of x from the ASCII file x.dat 
save the values of x and y into the MATLAB file 


save x.dat x save the value(s) of x into the ASCII file x.dat 

clear remove all or some variables/functions from memory 


Two-Dimensional Graphic Commands 

bar(x,y),plot(x,y),stairs(x,y) plot the values of y versus x in a bar\continuous 
stem(x,y),loglog(x,y) \stairs\discrete\xy-log\x-log\y-log graph 

semilogx(x,y),semilogy(x,y) 


Applied Numerical Methods Using MATLAB ®, by Yang, Cao, Chung, and Morris 
Copyright © 2005 John Wiley & Sons, Inc., ISBN 0-471-69833-4 




492 MATLAB 


Table 1.1 Commonly Used Commands and Functions in MATLAB 


plot(y) (y: read-valued) plot the values of vector\array over the index 

plot(y) (y: complex-valued) plot the imaginary part versus the real part: 
plot(real(y),imag(y)) 


bar(y, sis 2 s 3 ) 
plot(y, sis 2 s 3 ) 

stairs(y, S1S2S3) 
stem(y, S1S2S3) 
loglog(y, S1S2S3) 
semilogx(y, S1S2S3) 
semilogy(y, S1S2S3) 
plot(yl, S1S2S3, y2, S1S2S3) 


The string of three characters S1S2S3, given as one of the 
input arguments to these graphic commands specifies the 
color, the symbol, and the line types: 
si(color): y(ellow), m(agenta), c(yan), r(ed), g(reen), 
b(lue), w(hite), (blac)k 

S2(symbol):.(point), o,x,+,*, s(quare: □), d(iamond:(>), 
v(V), '(A), <(<]), >([>), p(entagram:fV), h(exagram) 
S3(line symbol): -(solid, default), :(dotted), 
-.(dashdot),-(dashed) 

(ex) plot(x, 1 b+: 1 ) plots x(n) with the + symbols on 
a blue dotted line 


polar(theta,r) 


plot the graph in polar form with the phase theta and 
magnitude r 


Auxiliary Graphic Commands 


axis([xmin xmax ymin ymax]) 

clf(clear figure) 

grid on/off 

hold on/off 

subplot(ijk) 

text(x,y ,plot(y, '***’) 

title(‘**’), xlabel(‘**’), 

ylabel(‘**’) 


specify the ranges of graph on horizontal/vertical axes 

clear the existent graph(s) 

draw/remove the grid lines 

keep/remove the existent graph(s) 

divide the screen into i x j sections and use the k th one 

print the string “***’ in the position (x,y) on the graph 

print the string ***’ into the top/low/left side of graph 


Three-Dimensional Graphic Commands 

mesh(X,Y, Z) connect the points of height Z at points (X,Y) where 

X,Y and Z are the matrices of the same dimension 


mesh(x, y, Z) 


connect the points of height Z(j, i) at points specified by 
the two vectors (x(i),y(j)) 


mesh(Z), surf(), plot3(), connect the points of height Z(j, i) at points specified by 

contour() (i, j) 


Once you installed MATLAB, you can click the icon like the one in the left side 
to run MATLAB. Then you will see the MATLAB command window 
on your monitor as depicted in Fig. 1.1, where a cursor appears 
(most likely blinking) to the right of the prompt like ‘>>’ or 
lEuDal “?’ waiting for you to type in a command. If you are running 
MATLAB of version 6.x, the main window has not only the command window, 
but also the workspace box and the command history box on the left-up/down 
side of the command window, in which you can see the contents of MATLAB 
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memory and the commands you have typed into the Command window up to 
the present time, respectively. You might clear the boxes by clicking the cor¬ 
responding submenu under the ‘Edit’ menu and even remove/restore them by 
un-checking/checking the corresponding submenu under the ‘View’ menu. 

How do we work with the MATLAB command window? 

• By clicking ‘File’ on the top menu and then ‘New’/‘Open’ in the File pull¬ 
down menu, you can create/edit any file with the MATLAB editor. 

• By clicking ‘File’ on the top menu and then ‘Set_Path’ in the File pull-down 
menu, you can make the MATLAB search path include/exclude the paths 
containing the files you want to be run. 

• If you are a beginner in MATLAB, then it may be worthwhile to click ‘Help’ 
on the top menu, click ‘Demos’ in the Help pull-down menu, (double-)click 
any topic that you want to learn, and watch the visual explanation about it. 

• By typing any MATLAB commands/statements in the MATLAB command 
window, you can use various powerful mathematic/graphic functions 
of MATLAB. 

• If you have an m-file that contains a series of commands/statements com¬ 
posed for performing your job, you can type in the file name (without the 
extension ‘.m’) to make it run. 

It is helpful to know the procedure of debugging in MATLAB, which is 
summarized below. 

1. With the program (you want to edit) loaded into the MATLAB Editor/ 
Debugger window, set breakpoint(s) at any statement(s) which you think 





494 MATLAB 



10 - 

11 h 

12 h 


[2), Stop If Error 

tic, xo_steep= opt_steep(f ,xO,epsx,epsf ,alphaO); t Stop If Warning 
tic, [xo_fminu,fo_fminu] =fminunc(f,xO); time(4)=tor Stop If NaN Or Inf 
tic, xo_conjg= opt _conj g( f, xO, epsx, epsf, a I phaO) 

A=[]; B=[]; Aeq=[]; Beq=[]; £no linear constraints 
l=-inf*onesj(size(xO)); u=inf*ones(size(xO)); Virtually, no lower/upperboui 


rr 


I Ln 12 Col 12 


Figure 1.2 The MATLAB file editor/debugger window. 


is (are) suspicious to be the source(s) of error, by clicking the pertinent 
statement line of the program with the left mouse button and pressing the 
F12 key or clicking ‘Set/Clear Breakpoint’ in the ‘Breakpoints’ pull-down 
menu of the Editor/Debugger window. Then, you will see a small red disk 
in front of every statement at which you set the breakpoint. 

2. Going to the MATLAB Command window, type in the name of the file 
containing the main program to try running the program. Then, go back to 
the Editor/Debugger window and you will see the cursor blinking just after 
a green arrow between the red disk and the first statement line at which 
you set the breakpoint. 

3. Determining which variable to look into, go to the Command window 
and type in the variable name(s) (just after the prompt ‘K»’) or whatever 
statement you want to run for debugging. 

4. If you want to proceed to the next statement line in the program, go back 
to the Editor/Debugger window and press the F10 (single_step) key or the 
Fll (step_in) key to dig into a called routine. If you want to jump to the 
next breakpoint, press F5 or click ‘Run (Continue)’ in the Debug pull-down 
menu of the Editor/Debugger window. If you want to run the program until 
just before a statement, move the cursor to the line and click ‘Go Until 
Cursor’ in the Debug pull-down menu (see Fig. 1.2). 

5. If you have figure out what is wrong, edit the pertinent part of the program, 
save the edited program in the Editor/Debugger window, and then go to the 
Command window, typing the name of the file containing the main program 
to try running the program for test. If the result seems to reflect that the 
program still has a bug, go back to step 1 and restart the whole procedure. 
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If you use the MATLAB of version 5.x, you can refer to the usage of the 
constrained minimization routine ‘constr()\ which is summarized in the box 
below. 


USAGE OF THE MATLAB 5.X BUILT-IN FUNCTION “CONSTRO” 

FOR CONSTRAINED OPTIMIZATION 

[x,options] = constrf'ftn',x0,options,l,u) 

• Input arguments (only ‘ftn’ and xO required, the others optional) 

‘ftn’ : usually defined in an m-file and should return two output 
arguments, one of which is a scalar value (/(x)) of the 
function (ftn) to be minimized and the other is a vector 
(g(x)) of constraints such that g(x) < 0. 

xO : the initial guess of solution 

options: is used for setting the termination tolerance on x, /(x), and 
constraint violation through options(2)/(3)/(4), the number of 
the (leading) equality constraints among g(x) < 0 through 
options (13), etc. 

(For more details, type ‘help foptions’ into the MATLAB 
command window) 

1, u : lower/upper bound vectors such that 1 < x < u. 

• Output arguments 

x : minimum point reached in the permissible region satisfying 

the constraints. 

options: outputs some information about the search process and the 
result like the function value at the minimum point (x) 
reached through options (8). 
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SUBJECT INDEX 


A 

Absolute error, 33 
Acceleration of Aitken, 201 
Adams-Bashforth-Moulton (ABM) method, 
269 

Adaptive input argument, 46 
Adaptive quadrature, 231 
Alignment, 30 

alternating direction implicit (ADI) method, 

W I 

Animation, 302, 438 
Apostrophe, 15 

Approximation, 124, 209, 212[ 323 

B 

backslash, 19, 59, 60, 76, 109, 110 

backward difference approximation, 210 

backward substitution, 82 

basis function, 420 

bilinear interpolation, 142 

bisection method, 183 

Boltzmann, 335 

boundary condition, 134, 401, 404, 420, 

430 432 

Boundary mode, 434 
boundary node, 420 

boundary value problem (BVP), 287, 305-319 
bracketing method, 188 
breakpoint, 493 
Bulirsch-Stoer, 161 

C 

case, 24 

catastrophic cancellation, 32 

central difference approximation, 211, 212 

characteristic equation, 371, 465 


characteristic value, 371, 465 
characteristic vector, 371, 465 
Chebyshev coefficient polynomial, 126 
Chebyshev node, 125, 160 
Chebyshev polynomial, 124, 127, 240 
chemical reactor, 297 

Cholesky decomposition (factorization), 97 
circulant matrix, 391 
conjugate gradient, 332 
constrained linear least squares (LLS), 354 
constrained optimization, 343, 350, 352 
constructive solid geometry (CSG), 432 
contour, 11, 295, 345, 349 
convergence, 103, 378-379 
covariance matrix, 386 
Crank-Nicholson method, 409, 452 
CtFT, 68, 475 
cubic spline, 133, 162-164 
curve fitting, 143, 147, 165, 167 

D 

damped Newton method, 193 
data file, 47 
dat-file, 2 
dc motor, 298 
debugging, 493 
decoupling, 374, 376 
determinant, 464 
DFT, 151-156, 171-175 
diagonalization, 374-376 
difference approximation, 209, 211, 216, 218 
differential equation, 263, 487 
Dirichlet boundary condition, 404, 430, 434, 
452 

discretization, 281 
distinct eigenvalues, 373 
divided difference, 120-122 
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double integral/integration, 241, 259 
Draw mode, 432 

DtFT (Discrete-time Fourier Transform), 476 
E 

eigenmode PDE, 431 
eigenpair, 371 

eigenvalue, 371, 377, 385, 389, 465 
eigenvalue problem, 314, 389 
eigenvector, 371, 377, 385, 465 
electric potential, 427, 442 
element-by-element operation, 15, 52 
elliptic PDE, 401, 402, 420, 430 

error, 31, 33, 35, 40, 213, 226, 274 

error analysis, 159, 225 

error estimate, 226 

error magnification, 31 

error propagation, 33 

errorbar, 148 

Euler’s method, 263 

explicit central difference method, 415, 417 
explicit forward Euler method, 406, 410 
exponent field, 28 

F 

factorial,, 40 

false position, 185 

FFT (Fast Fourier Transform), 151 

finite difference method (FDM), 290 

finite element method (FEM), 420, 431, 455 

fixed-point, 99, 179, |197| 

Fletcher-Reeves (FR), 332, 333 
forward difference approximation, 209, 218, 
406 

Fourier series/transform, 150, 475 
full pivoting, 85 

G 

Gauss elimination, 79 
Gauss quadrature, 234 
Gauss-Chebyshev, 240 
Gauss-FIermite, 238, 251, 253 
Gauss-Jordan elimination, 89, 106 
Gauss-Laguerre, 239, 254, 255 
Gauss-Legendre, 235, 251, 255 
Gauss-Seidel iteration, 100, 103, 115 
Gaussian distribution, 24 
genetic algorithm, 338, 340 
Gerschgorin’s Disk Theorem, 380 
golden search, 321, 322 
gradient, 294, 328, 330, 47j 
graphic command, 491 


H 

Flamming method, 273 

heat equation, 406, 412 

heat flow equation, 410 

Helmholtz’s equation, 402 

Hermite interpolating polynomial 139, 

Hermite polynomial, 66, 238 

Hermitian, 466 

Hessenberg form, 395-397 

Hessian, 330, 472 

Heun’s method, 266 

hidden bit, 28 

Hilbert matrix, 88 

histogram, 23 

Householder, 392-395 

hyperbolic PDE, 401, 414, 430, 440, 453 


IDFT, 151 

IEEE 64-bit floating-point number, 28 
ilaplace, 280 
ill-condition, 88 

implicit backward Euler method, 407, 452 
improper integral, 248, 249 
inconsistency, 83, 85 
independent eigenvectors, 373 

interior node, 425 

interpolation, 117, 119, 133, 141, 161 
2-dimensional 141 
interpolation by using DFS, 155 
interpolation function, 420 
inverse matrix, 92, 465 
inverse power method, 380, 381 
IVP (initial value problem), 263, 284 

J 

Jacobi iteration, 98 
Jacobi method, 381-384 
Jacobian, 191, 472, 484 

K 

keyboard input, 2 

L 

Lagrange coefficient polynomial, 118 
Lagrange multiplier method, 74, 343, 344 
Lagrange polynomial, 117, 118 
Laguerre polynomial, 239 
Laplace transform, 278, 280, 473 
Laplace’s Equation, 402, 404, 427, 435, 442 
largest number in MATLAB, 27 
leakage, 155, 174 

least squares (LS), 144, 165, 169, 171!, 351, 354 
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Legendre polynomial, 236 
length of arc/curve, 257 
limit, 483 

linear equation, 71, 79 
linear programming (LP), 355, 361 
logical operator, 25 
loop, 26 

loop iteration, 39 

Lorenz equation, 297 

loss of significance, 31, 32 

LSE (least squares error), 73 

LU decomposition (factorization), 92 

M 

mantissa field, 28 
mat-file, 2 

mathematical functions, 10 
matrix, 15, 463 

matrix inversion lemma, 78, 469 

mean value theorem, 461 

mesh, 11, 48, 49, 431-444 

midpoint rule, 222; 

minimum-norm solution, 73| 

mixed boundary condition, 287, 306, 308 

modal matrix, 373-376 

mode, 285, 377-378, 386, 432, 434 

modification formula, 272, 274 

mu law, mu-inverse law, 53, 335 

N 

negligible addition, 31 
Nelder-Mead Algorithm, 325 
nested computing, 38, 121 
nested (calling) routine, 40 
Neumann boundary condition, 404, 431, 447, 
448, 451 

Newton method, 186, 188, 191, 330, 332 
Newton polynomial, 119 
nonlinear BVP, 312 
nonlinear least squares (NLLS), 352 
nonnegative least squares (NLS), 355 
norm, 58 

normal (Gaussian) distribution, 24 
normalized range, 29 
null space, 73, 467 
numerical differentiation, 209, 244 
numerical integration, 222, 247, 249 

O 

on-line recursive computation of DFT, 176 
orthogonal, 382, 395, 466 
orthonormal, 382, 385 
over-determined, 75 
overflow, 34, 64 


P 

Pade approximation, 129, 160 
parabolic PDE, 406, 410, 412i 414, 430, 438, 
449 

two-dimensional PDE, 412 
parallelepiped, 389 
parallelogram, 388 

parameter passing through VARARGIN, 45 
parameter sharing via GLOBAL, 44 
partial differential equation (PDE), 401 
partial pivoting, 81,85,105 
path, 1 

PDE mode, 434 
PDEtool, 429-431, 435, 456 
penalty, 346-349, 362, 366 
permutation, 94, 467 
persistent excitation, 169 
physical meaning of eigenvalues and 
eigenvectors, 385 
pivoting, 85-88, 105-106 
plot, 6-11 
Plot mode, 440 

Polak-Ribiere (PR) method, 332, 333 
polynomial approximation, 124] 
polynomial curve fitting by least squares, 146, 
169 

polynomial wiggle, 124 
positive definite, 468 
predictor/corrector errors, 272 
projection operator, 74 
pseudo (generalized) inverse, 17, 73, 76 

Q 

QR decomposition (factorization), 97, 392-396 

quadratic approximation method, 323-325 

quadratic interpolation, 157 

quadratically convergent, 188 

quadrature, 222, 231, 234 

quantization error, 63, 212 

quenching factor, 335 

R 

rank, 467 

recursive, 40, 66, 176, 201, 228, 231 
recursive least square estimation (RLSE), 76, 
104 

redundancy, 83, 85 

regula falsi, 185 

relational operators, 25 

relative error, 33 

relaxation, 104, 115 

reserved constants/variables, 13 

Richardson’s extrapolation, 211, 216 

RLSE (Recursive Least Squares Estimation), 76 
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robot path planning, 164 
Romberg integration, 228-230 
rotation matrix, 382, 384 
round-off error, 31, 35, 212, 213 
row echelon form, 467 
row space, 467 
row switching, 91, 105 
Runge Phenomenon, 124 
Runge-Kutta (RK4), 267 
runtime error, 40 

S 

saddle point, 358 
sampling period, 151, 153, 172 
scalar product, 469 
scaled partial pivoting, 85, 105 
scaled power method, 378-379 
Schroder method, 202 
secant method, 189, 201 
self-calling, 201 

shifted inverse power method, 380 
shooting method, 287, 305, 309, 312 
shooting position, 288, 307 
similarity transformation, 373' 

Simpson’s rule, 222, 226 
simulated annealing, 334, 336 
sine function, 41, 51 
single_step, 494 

smallest positive number in MATLAB, 

Solve mode, 434 

SOR (successive over-relaxation), 104 
sparse, 489 

stability, 378, 386, 406-410, 415-416, 418, 
450 

state equation, 277, 281, 283, 295, 299 
steepest descent, 328, j330l 
Steffensen method, 201 
step-size, 212-215, 264-265, 269, 286, 328, 
332 

step-size dilemma, 213 
step_in, 494 


stiff, 284-286, 298-299, 386 
Sturm-Liouville (BVP) equation, 319 
surface area of revolutionary object, 258 
SVD (singular value decomposition), 98, 112 
symbolic, 193, 233, 280, 481 
symbolic variable, 194, 481 
symmetric matrix, 381-382, 466 
Symmetric Diagonalization Theorem, 382 

T 

Taylor series theorem, 462, 485 
temperature, 404, 406, 412, 435, 438 
term-wise operation, 15, 52 
Toeplitz matrix, 390 
trapezoidal rule, 222, 225, 226 
tri-diagonal, 107, 108 
truncation error, 31, 212, 213 
two-dimensional interpolation, 141, 168 

U 

unconstrained optimization, 321, 350 
unconstrained least squares, 355 
underdetermined, |72; 
underflow, 34, 64 

uniform probabilistic distribution, 22 
unitary, 466 

un-normalized range, 29 
V 

Van der Pol equation, 285, 296 
vector, 15, 469 

vector differential equation, 277, 284 
vector operation, 39 
vector product, 469 
vibration, 416, 418, 440 
volume, 243, 258 

W 

wave equation, 414, 416-418, 453 
weight least-squares (WLS), 145, | i47'j 171 

X 

zero-padding, 151 


INDEX FOR MATLAB 
ROUTINES 


(cf) A/C/E/P/S/T stand for Appendix/Chapter/Example/Problems/Section/Table, respectively. 

(cf) The routines whose name starts with a capital letter are constructed in this book. 

(cf) A program named “nmijk.m” can be found in Section i.j-k. 

Name 

Place 

Description 

abmc 

S6.4-1 

Predictor/Corrector coefficients in Adams-Bashforth- 
Moulton ODE solver 

adapt_Smpsn() 

S5.8 

ntegration by the adaptive Simpson method 

adcl() 

PI.10 

AD conversion 

adc2() 

PI.10 

AD conversion 

axis() 

SI.1-4 

specify axis limits or appearance 

backslash(\) 

PI.14 

left matrix division 

backsubstO 

S2.4-1 

backward substitution for lower-triangular matrix 
equation 

bar()/barh() 

SI.1-4 

a vertical/horizontal bar chart 

bisct() 

S4.2 

bisection method to solve a nonlinear equation 

break 

SI.1-9 

terminate execution of a for loop or while loop 

bvp2_eig() 

P6.ll 

solve an eigenvalue BVP2 

bvp2 fdf() 

S6.6-2 

FDM (Finite difference method) for a BVP 

bvp2 fdfp() 

P6.6 

FDM for a BVP with initial derivative fixed 

bvp2 shoot() 

S6.6-1 

Shooting method for a BVP (boundary value problem) 

bvp2 shootpO 

P6.6 

Shooting method for a BVP with initial derivative fixed 

bvp2_fdf() 

S6.6-2 

FDM (Finite difference method) for a BVP 

bvp2_fdfp() 

P6.6 

FDM for a BVP with initial derivative fixed 

bvp2m_ shootpO 

P6.7 

Shooting method for BVP with mixed boundary 
condition I 

bvp2m_fdfp() 

P6.7 

FDM for a BVP with mixed boundary condition I 

bvp2mm_ shootpO 

P6.8 

Shooting method for BVP with mixed boundary 
condition II 

bvp2mm_ fdfp() 

P6.8 

FDM for a BVP with mixed boundary condition II 

bvp2_fdfp() 

P6.6 

Finite difference method for a BVP with initial 
derivative 

bvp4c() 

S6.6-2, 

fixed 


P6.7~10 

BVP solver 

ceil() 

SI.1-5 (T1.3) 

round toward infinity 

chebyO 

S3.3 

Chebyshev polynomial approximation 

chol() 

S2.4-2 

Cholesky factorization 
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clear 

SI.1-2 

remove items from workspace, freeing up system 
memory 

elf 

SI.1-4 

clear current figure window 

compare DFT FFT 

S3.9-1 

compare DFT with FFT 

cond() 

S2.2-2 

condition number 

constr() 

AH 

constrained minimization (in MATLAB 5.x) 

contour() 

SI.1-5 

2-D contour plot of a scalar-valued function of 2-D 
variable 

conv() 

SI.1-6 

convolution of two sequences 

or multiplication of two polynomials 

cspline() 

S3.5 

cubic spline interpolation 

CtFTl() 

PI.26 

Inverse Continuous-time Fourier Transform 

curve fltO 

P3.9 

weighted least-squares curve fitting 

c2d steq() 

S6.5-2 

continuous-time state equation to discrete-time one 

dblquad() 

SI.1-7 

2-D (double) integral 

diag() 

S5.3 

construct a diagonal matrix or get diagonals of a matrix 

difapx() 

S5.4, AG2-3 

difference approximation for numerical derivatives 

diff() 

S5.10, P5.14 

differences between neighboring elements in an array 

disp() 

SI.1-3 

display text or array onto the (monitor) screen 

docheby 

S3.3 

approximate by Chebyshev polynomial 

do condition 

S2.2-2 

condition numbers for ill-conditioned matrices 

do_csplines 

S3.5 

interpolate by cubic splines 

do FFT 

S3.9-1 

do FFT (Fast Fourier Transform) 

do gauss 

S2.2-1 

do Gauss elimination 

do hermit 

S3.6 

do Hermite polynomial interpolation 

do_interp2 

S3.7 

do 2-dimensional interpolation 

do_lagranp 

S3.1 

do Lagrange polynomial interpolation 

do_lagnewch 

S3.3 

try Lagrange/Newton/Chebyshev polynomial 

do_lu_dcmp 

S2.4-1 

do LU decomposition (factorization) 

doMBK 

P6.4 

simulate a mass-damper-spring system 

do_newtonp 

S3.2 

do Newton polynomial interpolation 

donewtonpl 

S3.2 

do Newton polynomial interpolation 

do_pade() 

S3.4 

do Pade (rational polynomial) approximation 

do_polyflts() 

S3.8-2 

do polynomial curve fitting 

doRDFT 

P3.20 

do recursive DFT 

do_ quiver 

P6.0 

use quiver() to plot the gradient vectors 

dorlse 

S2.1-4 

do recursive least-squares estimation 

do_wlse 

S3.8-2 

do weighted least-squares curve fitting 

double() 

AG1 

convert to double-precision 

draw _ MB K 

P6.4 

simulate a mass-damper-spring system 

dsolve() 

S6.6-2, 

P6.3, AG5 

symbolic differential equation solver 

eig() 

S8.1 

eigenvalues and eigenvectors of a matrix 

eig_Jacobi() 

S8.4 

find the eigenvalues/eigenvectors of a symmetric matrix 

eig_power() 

S8.3 

find the largest eigenvalue & the corresponding 
eigenvector 

eig_QR() 

P8.7 

find eigenvalues using QR factorization 

eig_QR_Hs() 

P8.7 

find eigenvalues using QR factorization via Hessenberg 

else 

SI.1-9 

for conditional execution of statements 

elseif 

SI.1-9 

for conditional execution of statements 

end 

SI.1-9 

terminate for/while,/witch/try/if statements or last index 

err_of_sol_de() 

P6.9 

evaluate the error of solution of differential eq. 

eval() 

SI.1-5 (T1.3) 

evaluate a string containing a MATLAB expression 
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eyeO 

SI.1-7 

identity matrix (having 1/0 on/off its diagonal) 

ezplot() 

Sl.3-6 

easy plot 

falspO 

S4.3 

false position method to solve a nonlinear equation 

fem basis ftn() 

S9.4 

coefficients of each basis function for subregions 

fem coef() 

S9.4 

coefficients for subregions 

feval(): 

SI.1-6 

evaluation of a function defined by inline() or in an 

M-file 

find() 

PI.10 

find indices of nonzero (true) elements 

findsym() 

S4.7 

find the symbolic variables in a symbolic expression 

fix() 

SI.1-5 (T1.3) 

round towards zero 

fixpt() 

S4.1 

fixed-point iteration to solve a nonlinear equation 

fliplr() 

SI.1-7 

flip the elements of a matrix left-right 

flipud() 

SI.1-7 

flip the elements of a matrix up-down 

floor() 

SI.1-5 (T1.3) 

round to—infinity 

fminbnd() 

S7.1-2 

unconstrained minimization of one-variable function 

fmincon() 

S7.3-2 

constrained minimization 

fminimax() 

S7.3-2 

minimize the maximum of vector/matrix-valued function 

fminsearch() 

S7.2-2, 7.3-1 

unconstrained nonlinear minimization (Nelder-Mead) 

fminunc() 

S7.2-2, 7.3-1 

unconstrained nonlinear minimization (gradient-based) 

for 

SI.1-9 

repeat statements a specific number of times 

format 

SI.1-3 

control display format for numbers 

forsubst() 

S2.4-1 

forward substitution for lower-triangular matrix equation 

fprintf() 

SI.1-3, P1.2 

write formatted data to screen or file 

fsolve() 

S4.6,4.7,E4.3 

solve nonlinear equations by a least squares method 

gauseid() 

S2.5-2 

Gauss-Seidel method to solve a system of linear 
equations 

gauss() 

S2.2-2 

Gauss elimination to solve a system of linear equations 

gauss_legendre() 

S5.9-1 

Gauss-Legendre integration 

gausslpO 

S5.9-1 

grid points of Gauss-Legendre integration formula 

gausshpO 

S5.9-2 

grid points of Gauss-Hermite integration formula 

genetic() 

S7.1-8 

optimization by the genetic algorithm (GA) 

ginput() 

SI.1-4 

input the x- & y-coordinates of point(s) clicked by 
mouse 

global 

Sl.3-5 

declare global variables 

gradient() 

P6.0 

numerical gradient 

grid on/off 

SI.1-4 

grid lines for 2-D or 3-D graphs 

gtext() 

SI.1-4 

mouse placement of text in a 2-D graph 

heat_exp() 

S9.2-1 

explicit forward Euler method for parabolic PDE (heat 
eq) 

heat_imp() 

S9.2-2 

implicit backward Euler method for parabolic PDE (heat 
eq) 

heat_CN() 

S9.2-3 

Crank-Nicholson method for parabolic PDE (heat eq) 

heat2_ADI() 

S9.2-4 

ADI method for parabolic PDE (2-D heat equation) 

help 

SI.1-1 

display help comments for MATLAB routines 

hermit() 

S3.6 

Hermite polynomial interpolation 

hermitpO 

S5.9-2 

Hermite polynomial 

hermitsO 

S3.6 

multiple Hermite polynomial interpolations 

hessenbergO 

P8.5 

transform a matrix into almost upper-triangular one 

hist() 

SI.1-4, 1.1-8 

plot a histogram 

hold on/off 

SI.1-4 

hold on/off current graph in the figure 

housholder() 

P8.4 

Householder matrix to zero-out the tail part of a vector 

ICtFTl() 

P1.26 

Inverse Continuous-time Fourier Transform 

if 

SI.1-9 

for conditional execution of statements 
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inline() 

SI.1-6 

define a function inside the program 

inpolygon() 

S9.4 

is the point inside an polygonal region? 

input() 

SI.1-3 

request and get user input 

int() 

S5.8, AG2 

numerical/symbolic integration 

interpl() 

S3.5 

1-D interpolation 

interp2() 

S3.7 

2-D interpolation 

intrplO 

P3.10 

1-D interpolation 

intrp2() 

S3.7 

2-D interpolation 

interpolate by DFS 

S3.9-3 

interpolation using DFS 

int2s() 

S5.10, P5.14 

2-D (double) integral 

inv() 

SI.1-7 

the inverse of a matrix 

isemptyO 

PI.10 

is it empty (no value)? 

isnumericO 

PI.10 

has it a numeric value? 

jacob() 

S4.6 

Jacobian matrix of a given function 

jacobl() 

P5.3 

Jacobian matrix of a given function 

jacobi() 

S2.5-1 

Jacobi iteration to solve a equation 

Jkb() 

P1.21 

1 st kind of k-\h order Bessel function 

lagranpO 

S3.1 

Lagrange polynomial interpolation 

lgndrp() 

S5.9-1 

Legendre polynomial 

length() 

SI.1-7 

the length of a vector (sequence) 

limitO 

AG2-2 

limit of a symbolic expression 

lin eq() 

S2.1-3 

solve linear equation(s) 

linprogO 

S7.3-3 

solve a linear programming (LP) problem 

load 

SI.1-2,4 

read variable(s) from file 

loglogO 

SI.1-4 

plot data as logarithmic scales for the x-axis and y-axis 

lookfor 

SI.1-1 

search for string in the first comment line in all M-files 

lscov() 

S3.8-1 

weighted least-squares with known (error) covariance 

lsqcurvefitO 

S3.8-3 

weighted nonlinear least-squares curve fitting 

lsqlinO 

S7.3-1 

solve a linear least squares (LLS) problem 

lsqnonlinO 

S7.3-1 

solve a non-linear least squares (NLLS) problem 

lsqnonnegO 

S7.3-2 

find a non-negative least squares (NNLS) solution 

lu() 

S2.4-1 

LU decomposition (factorization) 

lu_dcmp() 

S2.4-1 

LU decomposition (factorization) 

max() 

SI.1-7 

find the maximum element(s) of an array 

mesh() 

SI.1-5, 3.7 

plot a mesh-type graph of f(x, y) 

meshgridO 

SI.1-5, 3.7 

grid points for plotting a mesh-type graph 

min() 

SI.1-7 

find the minimum element(s) of an array 

mkpp() 

PI.11 

make a piece-wise polynomial 

mod() 

SI.1-5 (T1.3) 

remainder after division 

mulaw() 

P1.9 

/i-law 

muinvO 

S7.1-7 

/x -1 law 

multiply matrix)) 

PI.12 

matrix multiplication 

newton() 

S4.4 

Newton method to solve a nonlinear equation 

newtonpO 

S3.2 

Newton polynomial interpolation 

newtons() 

S4.6 

Newton method to solve a system of nonlinear equation 

norm() 

PI.13 

norm of vector/matrix 

ode_ABM() 

S6.4-1 

solve a state equation by Adams-Bashforth-Moulton 

ode_Euler() 

S6.1 

solve a state equation by Euler’s method 

ode_Ham() 

S6.4-2 

solve a state equation by Hamming ODE solver 

ode_Heun() 

S6.2 

solve a state equation by Heun’s method 

ode_RK4() 

S6.3 

solve a state equation by Runge-Kutta method 

ode23()/ode450 

S6.4-3 

ODE solver 
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/odell3() 

odel5s()/ode23s() 

S6.5-4 

solve (stiff) ODEs 

/ode23t()/ode23tb() 

ones() 

SI.1-7 

constructs an array of ones 

opt_gs() 

S7.1-1 

optimization by Golden search 

opt quad() 

S7.1-2 

optimization by quadratic approximation 

opt Nelder() 

S7.1-3 

optimization by Nelder-Mead method 

opt_ steepO 

S7.1-4 

optimization by steepest descent 

opt_conjg() 

S7.1-6 

optimization by Conjugate gradient method 

padeapO 

S3.4 

Pade approximation 

pdetool 

S9.4 

start the PDE toolbox GUI (graphical user interface) 

pinv() 

SI.1-7, 2.1 

pseudo-inverse (generalized inverse) 

plot() 

SI.1-4,5 

linear 2-D plot 

plot3() 

SI.1-5 

linear 3-D plot 

poisson() 

S9.1 

central difference method for elliptic PDE (Poisson’s eq) 

polar() 

SI.1-4 

plot polar coordinates in a Cartesian plane with polar 

Poly der() 

PI.11 

grid 

derivative of polynomial 

polyder() 

PI.11 

derivative of polynomial 

polyfitO 

P3.13 

polynomial curve fitting 

PolyfitsO 

S3.8-2 

polynomial curve fitting 

polyintO 

PI.11 

integral of polynomial 

polyval() 

SI.1-6, 3.8-2 

evaluate a polynomial 

ppval() 

PI.11 

evaluate a set of piece-wise polynomials 

prettyO 

P3.1, AG2 

print symbolic expression like in type-set form 

prod() 

SI.1-7 

product of array elements 

qrO 

S2.4-2 

QR factorization 

qr_hessenberg() 

P8.6 

QR factorization of Hessenberg form by Givens rotation 

quad() 

S5.8 

numerical integration 

quadl() 

S5.8 

numerical integration 

quiver() 

P6.0 

plot gradient vectors 

quiver3() 

P6.0 

plot normal vectors on a surface 

rand() 

SI.1-8 

uniform random number generator 

randn() 

SI.1-8 

Gaussian random number generator 

rational_interpolation() 

P3.6 

rational polynomial interpolation 

repetition() 

PI.14 

repetition of subsequences 

reshape() 

SI.1-7 

a matrix into one with given numbers of row/columns 

residue() 

PI.11 

partial fraction expansion of Laplace-transformed 

residuez() 

PI.11 

function 

partial fraction expansion of z-transformed rational 

rlse_online() 

S2.1-4 

function 

on-line Recursive Least-Squares Estimation 

rmbrg() 

S5.7 

Integration by Romberg method 

robot_path 

P3.9 

determine a path of robot using cubic splines 

roots() 

PI.11 

roots of a polynomial equation 

round() 

SI.1-5 (T1.3) 

round to nearest integer 

rot90() 

SI.1-7 

rotate a matrix by 90 degrees 

save 

SI.1-2 

save variable(s) into a file 

secant() 

S4.5 

secant method to solve a nonlinear equation 

semilogx() 

SI.1-4 

plot data as logarithmic scales for the x-axis 

semilogyO 

SI.1-4 

plot data as logarithmic scales for the y-axis 

size() 

SI.1-7 

the numbers of rows/columns of a 1-D/2-D/3-D array 

sim_anl() 

S7.1-7 

optimization by simulated annealing (SA) 
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simple() 

AG2-3 

simplest form of symbolic expression 

simplify)) 

AG2-3 

simplify symbolic expression 

smpsns() 

S5.6 

Integration by Simpson rule 

smpsns fxy() 

S5.10, P5.15 

1-D integration of a function /(x, y) along y 

solve() 

P3.1, S4.7, 
AG4 

symbolic solution of algebraic equations 

sort() 

SI.1-4 

arranges the elements of an array in ascending order 

spline() 

S3.5 

cubic spline 

sprintf() 

SI.1-4 

make formatted data to a string 

stairs() 

SI.1-4 

stair-step plot of zero-hold signal of sampled data 
systems 

stem() 

SI.1-4 

plot discrete sequence data 

subplot() 

SI.1-4, 1.1-7 

divide the current figure into rectangular panes 

subs() 

AG1 

substitute 

sum() 

SI.1-7 

sum of elements of an array 

surface() 

P6.0 

plot a surface-type graph of /(x, y) 

surfnorm() 

P6.0 

generate vectors normal to a surface 

svd() 

S2.4-2 

singular value decomposition 

switch 

SI.1-9 

switch among several cases 

syms 

P3.1, S4.7, 

AG 

declare symbolic variable(s) 

sym2poly() 

S5.3, AG2 

extract the coefficients of symbolic polynomial 
expression 

taylor() 

S5.3, AG2 

Taylor series expansion 

text() 

SI.1-4 

add a text at the specified location on the graph 

title() 

SI.1-4 

add title to current axes 

trid() 

S6.6-2 

solve a tri-diagonal system of linear equations 

trimesh() 

S9.4 

plot a triangular-mesh-type graph 

trpzds() 

S5.6 

Integration by trapezoidal rule 

varargin() 

SI.3-6 

variable length input argument list 

view() 

SI.1-5, P1.4 

3-D graph viewpoint specification 

vpa() 

AG 

evaluate double array by variable precision arithmetic 

wave() 

S9.3-1 

central difference method for hyperbolic PDE (wave eq) 

wave2() 

S9.3-2 

central difference method for hyperbolic PDE (2-D 
wave eq) 


SI.1-9 

repeat statements an indefinite number of times 

windowingO 

P3.18 

multiply a sequence by the specified window sequence 

xlabel()/ylabel() 

SI.1-4 

label the x-axis/y-axis 

zeros() 

SI.1-7 

construct an array of zeros 

zeroing() 

PI.15 

cross out every (kM-m)th element to zero 
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